The standard genetic code can evolve from a two-letter GC code

The model of an iterated learning approach the origins of the genetic code inspired this related hypothesis about a simplified precursor to the standard four-letter genetic code, which will be released in Origins of Life and Evolution of Biospheres:

The standard genetic code can evolve from a two-letter GC code without information loss or costly reassignments

Alejandro Frank and Tom Froese

It is widely agreed that the standard genetic code must have been preceded by a simpler code that encoded fewer amino acids. How this simpler code could have expanded into the standard genetic code is not well understood because most changes to the code are costly. Taking inspiration from the recently synthesized six-letter code, we propose a novel hypothesis: the initial genetic code consisted of only two letters, G and C, and then expanded the number of available codons via the introduction of an additional pair of letters, A and U. Various lines of evidence, including the relative prebiotic abundance of the earliest assigned amino acids, the balance of their hydrophobicity, and the higher GC content in genome coding regions, indicate that the original two nucleotides were indeed G and C. This process of code expansion probably started with the third base, continued with the second base, and ended up as the standard genetic code when the second pair of letters was introduced into the first base. The proposed process is consistent with the available empirical evidence, and it uniquely avoids the problem of costly code changes by positing instead that the code expanded its capacity via the creation of new codons with extra letters.

Advertisements

New paper on iterated learning at the origins of life

Jorge, Nathaniel and I have published an extension of our iterated learning approach to the origins of the genetic code in the Proceedings of the Artificial Life Conference 2018. We unexpectedly found that the most likely sequences in which amino acids get incorporated into the emerging genetic codes in our simulation model exhibit a remarkable overlap with the sequence predicted in the literature based on empirical considerations.

We will present this work at the ALIFE conference in Tokyo as part of the special session on “Hybrid Life: Approaches to integrate biological, artificial and cognitive systems”.

An iterated learning approach to the origins of the standard genetic code can help to explain its sequence of amino acid assignments

Tom Froese, Jorge I. Campos, and Nathaniel Virgo

Artificial life has been developing a behavior-based perspective on the origins of life, which emphasizes the adaptive potential of agent-environment interaction even at that initial stage. So far this perspective has been closely aligned to metabolism-first theories, while most researchers who study life’s origins tend to assign an essential role to RNA. An outstanding challenge is to show that a behavior-based perspective can also address open questions related to the genetic system. Accordingly, we have recently applied this perspective to one of science’s most fascinating mysteries: the origins of the standard genetic code. We modeled horizontal transfer of cellular components in a population of protocells using an iterated learning approach and found that it can account for the emergence of several key properties of the standard code. Here we further investigated the diachronic emergence of artificial codes and discovered that the model’s most frequent sequence of amino acid assignments overlaps significantly with the predictions in the literature. Our explorations of the factors that favor early incorporation into an emerging artificial code revealed two aspects: an amino acid’s relative probability of horizontal transfer, and its relative ease of discriminability in chemical space.

Figure 2

Illustration of the architecture of the genetic system of one of our hypothetical protocells.