Molecular evolution

Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics to explain patterns in these changes. Major topics in molecular evolution concern the rates and impacts of single nucleotide changes, neutral evolution vs. natural selection, origins of new genes, the genetic nature of complex traits, the genetic basis of speciation, evolution of development, and ways that evolutionary forces influence genomic and phenotypic changes.


The history of molecular evolution starts in the early 20th century with comparative biochemistry, and the use of "fingerprinting" methods such as immune assays, gel electrophoresis and paper chromatography in the 1950s to explore homologous proteins.[1][2] The field of molecular evolution came into its own in the 1960s and 1970s, following the rise of molecular biology. The advent of protein sequencing allowed molecular biologists to create phylogenies based on sequence comparison, and to use the differences between homologous sequences as a molecular clock to estimate the time since the last universal common ancestor.[1] In the late 1960s, the neutral theory of molecular evolution provided a theoretical basis for the molecular clock,[3] though both the clock and the neutral theory were controversial, since most evolutionary biologists held strongly to panselectionism, with natural selection as the only important cause of evolutionary change. After the 1970s, nucleic acid sequencing allowed molecular evolution to reach beyond proteins to highly conserved ribosomal RNA sequences, the foundation of a reconceptualization of the early history of life.[1]

Forces in molecular evolution

The content and structure of a genome is the product of the molecular and population genetic forces which act upon that genome. Novel genetic variants will arise through mutation and will spread and be maintained in populations due to genetic drift or natural selection.


Hedgehog with Albinism
This hedgehog has no pigmentation due to a mutation.

Mutations are permanent, transmissible changes to the genetic material (DNA or RNA) of a cell or virus. Mutations result from errors in DNA replication during cell division and by exposure to radiation, chemicals, and other environmental stressors, or viruses and transposable elements. Most mutations that occur are single nucleotide polymorphisms which modify single bases of the DNA sequence, resulting in point mutations. Other types of mutations modify larger segments of DNA and can cause duplications, insertions, deletions, inversions, and translocations.

Most organisms display a strong bias in the types of mutations that occur with strong influence in GC-content. Transitions (A ↔ G or C ↔ T) are more common than transversions (purine (adenine or guanine)) ↔ pyrimidine (cytosine or thymine, or in RNA, uracil))[4] and are less likely to alter amino acid sequences of proteins.

Mutations are stochastic and typically occur randomly across genes. Mutation rates for single nucleotide sites for most organisms are very low, roughly 10−9 to 10−8 per site per generation, though some viruses have higher mutation rates on the order of 10−6 per site per generation. Among these mutations, some will be neutral or beneficial and will remain in the genome unless lost via genetic drift, and others will be detrimental and will be eliminated from the genome by natural selection.

Because mutations are extremely rare, they accumulate very slowly across generations. While the number of mutations which appears in any single generation may vary, over very long time periods they will appear to accumulate at a regular pace. Using the mutation rate per generation and the number of nucleotide differences between two sequences, divergence times can be estimated effectively via the molecular clock.


Chromosomal Recombination
Recombination involves the breakage and rejoining of two chromosomes (M and F) to produce two re-arranged chromosomes (C1 and C2).

Recombination is a process that results in genetic exchange between chromosomes or chromosomal regions. Recombination counteracts physical linkage between adjacent genes, thereby reducing genetic hitchhiking. The resulting independent inheritance of genes results in more efficient selection, meaning that regions with higher recombination will harbor fewer detrimental mutations, more selectively favored variants, and fewer errors in replication and repair. Recombination can also generate particular types of mutations if chromosomes are misaligned.

Gene conversion

Gene conversion is a type of recombination that is the product of DNA repair where nucleotide damage is corrected using an homologous genomic region as a template. Damaged bases are first excised, the damaged strand is then aligned with an undamaged homolog, and DNA synthesis repairs the excised region using the undamaged strand as a guide. Gene conversion is often responsible for homogenizing sequences of duplicate genes over long time periods, reducing nucleotide divergence.

Genetic drift

Genetic drift is the change of allele frequencies from one generation to the next due to stochastic effects of random sampling in finite populations. Some existing variants have no effect on fitness and may increase or decrease in frequency simply due to chance. "Nearly neutral" variants whose selection coefficient is close to a threshold value of 1 / the effective population size will also be affected by chance as well as by selection and mutation. Many genomic features have been ascribed to accumulation of nearly neutral detrimental mutations as a result of small effective population sizes.[5] With a smaller effective population size, a larger variety of mutations will behave as if they are neutral due to inefficiency of selection.


Selection occurs when organisms with greater fitness, i.e. greater ability to survive or reproduce, are favored in subsequent generations, thereby increasing the instance of underlying genetic variants in a population. Selection can be the product of natural selection, artificial selection, or sexual selection. Natural selection is any selective process that occurs due to the fitness of an organism to its environment. In contrast sexual selection is a product of mate choice and can favor the spread of genetic variants which act counter to natural selection but increase desirability to the opposite sex or increase mating success. Artificial selection, also known as selective breeding, is imposed by an outside entity, typically humans, in order to increase the frequency of desired traits.

The principles of population genetics apply similarly to all types of selection, though in fact each may produce distinct effects due to clustering of genes with different functions in different parts of the genome, or due to different properties of genes in particular functional classes. For instance, sexual selection could be more likely to affect molecular evolution of the sex chromosomes due to clustering of sex specific genes on the X, Y, Z or W.

Selection can operate at the gene level at the expense of organismal fitness, resulting in a selective advantage for selfish genetic elements in spite of a host cost. Examples of such selfish elements include transposable elements, meiotic drivers, killer X chromosomes, selfish mitochondria, and self-propagating introns. (See Intragenomic conflict.)

Genome architecture

Genome size

Genome size is influenced by the amount of repetitive DNA as well as number of genes in an organism. The C-value paradox refers to the lack of correlation between organism 'complexity' and genome size. Explanations for the so-called paradox are two-fold. First, repetitive genetic elements can comprise large portions of the genome for many organisms, thereby inflating DNA content of the haploid genome. Secondly, the number of genes is not necessarily indicative of the number of developmental stages or tissue types in an organism. An organism with few developmental stages or tissue types may have large numbers of genes that influence non-developmental phenotypes, inflating gene content relative to developmental gene families.

Neutral explanations for genome size suggest that when population sizes are small, many mutations become nearly neutral. Hence, in small populations repetitive content and other 'junk' DNA can accumulate without placing the organism at a competitive disadvantage. There is little evidence to suggest that genome size is under strong widespread selection in multicellular eukaryotes. Genome size, independent of gene content, correlates poorly with most physiological traits and many eukaryotes, including mammals, harbor very large amounts of repetitive DNA.

However, birds likely have experienced strong selection for reduced genome size, in response to changing energetic needs for flight. Birds, unlike humans, produce nucleated red blood cells, and larger nuclei lead to lower levels of oxygen transport. Bird metabolism is far higher than that of mammals, due largely to flight, and oxygen needs are high. Hence, most birds have small, compact genomes with few repetitive elements. Indirect evidence suggests that non-avian theropod dinosaur ancestors of modern birds [6] also had reduced genome sizes, consistent with endothermy and high energetic needs for running speed. Many bacteria have also experienced selection for small genome size, as time of replication and energy consumption are so tightly correlated with fitness.

Repetitive elements

Transposable elements are self-replicating, selfish genetic elements which are capable of proliferating within host genomes. Many transposable elements are related to viruses, and share several proteins in common.

Chromosome number and organization

The number of chromosomes in an organism's genome also does not necessarily correlate with the amount of DNA in its genome. The ant Myrmecia pilosula has only a single pair of chromosomes[7] whereas the Adders-tongue fern Ophioglossum reticulatum has up to 1260 chromosomes.[8] Cilliate genomes house each gene in individual chromosomes, resulting in a genome which is not physically linked. Reduced linkage through creation of additional chromosomes should effectively increase the efficiency of selection.

Changes in chromosome number can play a key role in speciation, as differing chromosome numbers can serve as a barrier to reproduction in hybrids. Human chromosome 2 was created from a fusion of two chimpanzee chromosomes and still contains central telomeres as well as a vestigial second centromere. Polyploidy, especially allopolyploidy, which occurs often in plants, can also result in reproductive incompatibilities with parental species. Agrodiatus blue butterflies have diverse chromosome numbers ranging from n=10 to n=134 and additionally have one of the highest rates of speciation identified to date.[9]

Gene content and distribution

Different organisms house different numbers of genes within their genomes as well as different patterns in the distribution of genes throughout the genome. Some organisms, such as most bacteria, Drosophila, and Arabidopsis have particularly compact genomes with little repetitive content or non-coding DNA. Other organisms, like mammals or maize, have large amounts of repetitive DNA, long introns, and substantial spacing between different genes. The content and distribution of genes within the genome can influence the rate at which certain types of mutations occur and can influence the subsequent evolution of different species. Genes with longer introns are more likely to recombine due to increased physical distance over the coding sequence. As such, long introns may facilitate ectopic recombination, and result in higher rates of new gene formation.


In addition to the nuclear genome, endosymbiont organelles contain their own genetic material typically as circular plasmids. Mitochondrial and chloroplast DNA varies across taxa, but membrane-bound proteins, especially electron transport chain constituents are most often encoded in the organelle. Chloroplasts and mitochondria are maternally inherited in most species, as the organelles must pass through the egg. In a rare departure, some species of mussels are known to inherit mitochondria from father to son.

Origins of new genes

New genes arise from several different genetic mechanisms including gene duplication, de novo origination, retrotransposition, chimeric gene formation, recruitment of non-coding sequence, and gene truncation.

Gene duplication initially leads to redundancy. However, duplicated gene sequences can mutate to develop new functions or specialize so that the new gene performs a subset of the original ancestral functions. In addition to duplicating whole genes, sometimes only a domain or part of a protein is duplicated so that the resulting gene is an elongated version of the parental gene.

Retrotransposition creates new genes by copying mRNA to DNA and inserting it into the genome. Retrogenes often insert into new genomic locations, and often develop new expression patterns and functions.

Chimeric genes form when duplication, deletion, or incomplete retrotransposition combine portions of two different coding sequences to produce a novel gene sequence. Chimeras often cause regulatory changes and can shuffle protein domains to produce novel adaptive functions.

De novo gene birth can also give rise to new genes from previously non-coding DNA.[10] For instance, Levine and colleagues reported the origin of five new genes in the D. melanogaster genome from noncoding DNA.[11][12] Similar de novo origin of genes has been also shown in other organisms such as yeast,[13] rice[14] and humans.[15] De novo genes may evolve from transcripts that are already expressed at low levels.[16] Mutation of a stop codon to a regular codon or a frameshift may cause an extended protein that includes a previously non-coding sequence. The formation of novel genes from scratch typically can not occur within genomic regions of high gene density. The essential events for de novo formation of genes is recombination/mutation which includes insertions, deletions, and inversions. These events are tolerated if the consequence of these genetic events does not interfere in cellular activities. Most genomes comprise prophages wherein genetic modifications do not, in general, affect the host genome propagation. Hence, there is higher probability of genetic modifications, in regions such as prophages, which is proportional to the probability of de novo formation of genes.[17]

De novo evolution of genes can also be simulated in the laboratory. For example, semi-random gene sequences can be selected for specific functions.[18] More specifically, they selected sequences from a library that could complement a gene deletion in E. coli. The deleted gene encodes ferric enterobactin esterase (Fes), which releases iron from an iron chelator, enterobactin. While Fes is a 400 amino acid protein, the newly selected gene was only 100 amino acids in length and unrelated in sequence to Fes.[18]

In vitro molecular evolution experiments

Principles of molecular evolution have also been discovered, and others elucidated and tested using experimentation involving amplification, variation and selection of rapidly proliferating and genetically varying molecular species outside cells. Since the pioneering work of Sol Spiegelmann in 1967 [ref], involving RNA that replicates itself with the aid of an enzyme extracted from the Qß virus [ref], several groups (such as Kramers [ref] and Biebricher/Luce/Eigen [ref]) studied mini and micro variants of this RNA in the 1970s and 1980s that replicate on the timescale of seconds to a minute, allowing hundreds of generations with large population sizes (e.g. 10^14 sequences) to be followed in a single day of experimentation. The chemical kinetic elucidation of the detailed mechanism of replication [ref, ref] meant that this type of system was the first molecular evolution system that could be fully characterised on the basis of physical chemical kinetics, later allowing the first models of the genotype to phenotype map based on sequence dependent RNA folding and refolding to be produced [ref, ref]. Subject to maintaining the function of the multicomponent Qß enzyme, chemical conditions could be varied significantly, in order to study the influence of changing environments and selection pressures [ref]. Experiments with in vitro RNA quasi species included the characterisation of the error threshold for information in molecular evolution [ref], the discovery of de novo evolution [ref] leading to diverse replicating RNA species and the discovery of spatial travelling waves as ideal molecular evolution reactors [ref, ref]. Later experiments employed novel combinations of enzymes to elucidate novel aspects of interacting molecular evolution involving population dependent fitness, including work with artificially designed molecular predator prey and cooperative systems of multiple RNA and DNA [ref, ref]. Special evolution reactors were designed for these studies, starting with serial transfer machines, flow reactors such as cell-stat machines, capillary reactors, and microreactors including line flow reactors and gel slice reactors. These studies were accompanied by theoretical developments and simulations involving RNA folding and replication kinetics that elucidated the importance of the correlation structure between distance in sequence space and fitness changes [ref], including the role of neutral networks and structural ensembles in evolutionary optimisation.

Molecular phylogenetics

Molecular systematics is the product of the traditional fields of systematics and molecular genetics. It uses DNA, RNA, or protein sequences to resolve questions in systematics, i.e. about their correct scientific classification or taxonomy from the point of view of evolutionary biology.

Molecular systematics has been made possible by the availability of techniques for DNA sequencing, which allow the determination of the exact sequence of nucleotides or bases in either DNA or RNA. At present it is still a long and expensive process to sequence the entire genome of an organism, and this has been done for only a few species. However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs.

The driving forces of evolution

Depending on the relative importance assigned to the various forces of evolution, three perspectives provide evolutionary explanations for molecular evolution.[19][20]

Selectionist hypotheses argue that selection is the driving force of molecular evolution. While acknowledging that many mutations are neutral, selectionists attribute changes in the frequencies of neutral alleles to linkage disequilibrium with other loci that are under selection, rather than to random genetic drift.[21] Biases in codon usage are usually explained with reference to the ability of even weak selection to shape molecular evolution.[22]

Neutralist hypotheses emphasize the importance of mutation, purifying selection, and random genetic drift.[23] The introduction of the neutral theory by Kimura,[24] quickly followed by King and Jukes' own findings,[25] led to a fierce debate about the relevance of neodarwinism at the molecular level. The Neutral theory of molecular evolution proposes that most mutations in DNA are at locations not important to function or fitness. These neutral changes drift towards fixation within a population. Positive changes will be very rare, and so will not greatly contribute to DNA polymorphisms.[26] Deleterious mutations do not contribute much to DNA diversity because they negatively affect fitness and so are removed from the gene pool before long.[27] This theory provides a framework for the molecular clock.[26] The fate of neutral mutations are governed by genetic drift, and contribute to both nucleotide polymorphism and fixed differences between species.[28][29]

In the strictest sense, the neutral theory is not accurate.[30] Subtle changes in DNA very often have effects, but sometimes these effects are too small for natural selection to act on.[30] Even synonymous mutations are not necessarily neutral [30] because there is not a uniform amount of each codon. The nearly neutral theory expanded the neutralist perspective, suggesting that several mutations are nearly neutral, which means both random drift and natural selection is relevant to their dynamics.[30] The main difference between the neutral theory and nearly neutral theory is that the latter focuses on weak selection, not strictly neutral.[27]

Mutationists hypotheses emphasize random drift and biases in mutation patterns.[31] Sueoka was the first to propose a modern mutationist view. He proposed that the variation in GC content was not the result of positive selection, but a consequence of the GC mutational pressure.[32]

Protein evolution

Lipase Sequence Homology
This chart compares the sequence identity of different lipase proteins throughout the human body. It demonstrates how proteins evolve, keeping some regions conserved while others change dramatically.

Evolution of proteins is studied by comparing the sequences and structures of proteins from many organisms representing distinct evolutionary clades. If the sequences/structures of two proteins are similar indicating that the proteins diverged from a common origin, these proteins are called as homologous proteins. More specifically, homologous proteins that exist in two distinct species are called as orthologs. Whereas, homologous proteins encoded by the genome of a single species are called paralogs.

The phylogenetic relationships of proteins are examined by multiple sequence comparisons. Phylogenetic trees of proteins can be established by the comparison of sequence identities among proteins. Such phylogenetic trees have established that the sequence similarities among proteins reflect closely the evolutionary relationships among organisms.[33][34]

Protein evolution describes the changes over time in protein shape, function, and composition. Through quantitative analysis and experimentation, scientists have strived to understand the rate and causes of protein evolution. Using the amino acid sequences of hemoglobin and cytochrome c from multiple species, scientists were able to derive estimations of protein evolution rates. What they found was that the rates were not the same among proteins.[27] Each protein has its own rate, and that rate is constant across phylogenies (i.e., hemoglobin does not evolve at the same rate as cytochrome c, but hemoglobins from humans, mice, etc. do have comparable rates of evolution.). Not all regions within a protein mutate at the same rate; functionally important areas mutate more slowly and amino acid substitutions involving similar amino acids occurs more often than dissimilar substitutions.[27] Overall, the level of polymorphisms in proteins seems to be fairly constant. Several species (including humans, fruit flies, and mice) have similar levels of protein polymorphism.[26]

Relation to nucleic acid evolution

Protein evolution is inescapably tied to changes and selection of DNA polymorphisms and mutations because protein sequences change in response to alterations in the DNA sequence. Amino acid sequences and nucleic acid sequences do not mutate at the same rate. Due to the degenerate nature of DNA, bases can change without affecting the amino acid sequence. For example, there are six codons that code for leucine. Thus, despite the difference in mutation rates, it is essential to incorporate nucleic acid evolution into the discussion of protein evolution. At the end of the 1960s, two groups of scientists—Kimura (1968) and King and Jukes (1969)—independently proposed that a majority of the evolutionary changes observed in proteins were neutral.[26][27] Since then, the neutral theory has been expanded upon and debated.[27]

Discordance with morphological evolution

There are sometimes discordances between molecular and morphological evolution, which are reflected in molecular and morphological systematic studies, especially of bacteria, archaea and eukaryotic microbes. These discordances can be categorized as two types: (i) one morphology, multiple lineages (e.g. morphological convergence, cryptic species) and (ii) one lineage, multiple morphologies (e.g. phenotypic plasticity, multiple life-cycle stages). Neutral evolution possibly could explain the incongruences in some cases.[35]

Journals and societies

The Society for Molecular Biology and Evolution publishes the journals "Molecular Biology and Evolution" and "Genome Biology and Evolution" and holds an annual international meeting. Other journals dedicated to molecular evolution include Journal of Molecular Evolution and Molecular Phylogenetics and Evolution. Research in molecular evolution is also published in journals of genetics, molecular biology, genomics, systematics, and evolutionary biology.

See also


  1. ^ a b c Dietrich, Michael R. (1998). "Paradox and Persuasion: Negotiating the Place of Molecular Evolution within Evolutionary Biology". Journal of the History of Biology. 31 (1): 85–111. doi:10.1023/A:1004257523100. PMID 11619919.
  2. ^ Hagen, Joel B. (1999). "Naturalists, Molecular Biologists, and the Challenge of Molecular Evolution". Journal of the History of Biology. 32 (2): 321–341. doi:10.1023/A:1004660202226. PMID 11624208.
  3. ^ King, Jack L.; Jukes, Thomas (1969). "Non-Darwinian Evolution". Science. 164 (3881): 788–798. Bibcode:1969Sci...164..788L. doi:10.1126/science.164.3881.788. PMID 5767777.
  4. ^
  5. ^ Lynch, M. (2007). The Origins of Genome Architecture. Sinauer. ISBN 0-87893-484-7.
  6. ^ Organ, C. L.; Shedlock, A. M.; Meade, A.; Pagel, M.; Edwards, S. V. (2007). "Origin of avian genome size and structure in nonavian dinosaurs". Nature. 446: 180–184. Bibcode:2007Natur.446..180O. doi:10.1038/nature05621. PMID 17344851.
  7. ^ Crosland MW, Crozier RH (1986). "Myrmecia pilosula, an ant with only one pair of chromosomes". Science. 231 (4743): 1278. Bibcode:1986Sci...231.1278C. doi:10.1126/science.231.4743.1278. PMID 17839565.
  8. ^ Gerardus J. H. Grubben (2004). Vegetables. PROTA. p. 404. ISBN 978-90-5782-147-9. Retrieved 10 March 2013.
  9. ^ Nikolai P. Kandul; Vladimir A. Lukhtanov; Naomi E. Pierce (2007), "KARYOTYPIC DIVERSITY AND SPECIATION IN AGRODIAETUS BUTTERFLIES", Evolution, 61 (3): 546–559, doi:10.1111/j.1558-5646.2007.00046.x, PMID 17348919
  10. ^ McLysaght, Aoife; Guerzoni, Daniele (31 August 2015). "New genes from non-coding sequence: the role of de novo protein-coding genes in eukaryotic evolutionary innovation". Philosophical Transactions of the Royal Society B: Biological Sciences. 370 (1678): 20140332. doi:10.1098/rstb.2014.0332. PMC 4571571. PMID 26323763.
  11. ^ Levine MT, Jones CD, Kern AD, et al. (2006). "Novel genes derived from noncoding DNA in Drosophila melanogaster are frequently X-linked and exhibit testis-biased expression". Proc Natl Acad Sci USA. 103 (26): 9935–9939. Bibcode:2006PNAS..103.9935L. doi:10.1073/pnas.0509809103. PMC 1502557. PMID 16777968.
  12. ^ Zhou Q, Zhang G, Zhang Y, et al. (2008). "On the origin of new genes in Drosophila". Genome Res. 18 (9): 1446–1455. doi:10.1101/gr.076588.108. PMC 2527705. PMID 18550802.
  13. ^ Cai J, Zhao R, Jiang H, et al. (2008). "De novo origination of a new protein-coding gene in Saccharomyces cerevisiae". Genetics. 179 (1): 487–496. doi:10.1534/genetics.107.084491. PMC 2390625. PMID 18493065.
  14. ^ Xiao W, Liu H, Li Y, et al. (2009). El-Shemy HA (ed.). "A rice gene of de novo origin negatively regulates pathogen- induced defense response". PLoS ONE. 4 (2): e4603. Bibcode:2009PLoSO...4.4603X. doi:10.1371/journal.pone.0004603. PMC 2643483. PMID 19240804.
  15. ^ Knowles DG, McLysaght A (2009). "Recent de novo origin of human protein-coding genes". Genome Res. 19 (10): 1752–1759. doi:10.1101/gr.095026.109. PMC 2765279. PMID 19726446.
  16. ^ Wilson, Ben A.; Joanna Masel (2011). "Putatively Noncoding Transcripts Show Extensive Association with Ribosomes". Genome Biology and Evolution. 3: 1245–1252. doi:10.1093/gbe/evr099. PMC 3209793.
  17. ^ Ramisetty, Bhaskar Chandra Mohan; Sudhakari, Pavithra Anantharaman (2019). "Bacterial 'Grounded' Prophages: Hotspots for Genetic Renovation and Innovation". Frontiers in Genetics. 10. doi:10.3389/fgene.2019.00065. ISSN 1664-8021.
  18. ^ a b Donnelly, Ann E.; Murphy, Grant S.; Digianantonio, Katherine M.; Hecht, Michael H. (March 2018). "A de novo enzyme catalyzes a life-sustaining reaction in Escherichia coli". Nature Chemical Biology. 14 (3): 253–255. doi:10.1038/nchembio.2550. ISSN 1552-4469. PMID 29334382.
  19. ^ Graur, D. & Li, W.-H. (2000). Fundamentals of molecular evolution. Sinauer. ISBN 0-87893-266-6.
  20. ^ Casillas, Sònia; Barbadilla, Antonio (2017). "Molecular Population Genetics". Genetics. 205 (3): 1003–1035. doi:10.1534/genetics.116.196493. PMC 5340319.
  21. ^ Hahn, Matthew W. (February 2008). "Toward A Selection Theory Of Molecular Evolution". Evolution. 62 (2): 255–265. doi:10.1111/j.1558-5646.2007.00308.x. PMID 18302709.
  22. ^ Hershberg, Ruth; Petrov, Dmitri A. (December 2008). "Selection on Codon Bias". Annual Review of Genetics. 42 (1): 287–299. doi:10.1146/annurev.genet.42.110807.091442. PMID 18983258.
  23. ^ Kimura, M. (1983). The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge. ISBN 0-521-23109-4.
  24. ^ Kimura, Motoo (1968). "Evolutionary rate at the molecular level" (PDF). Nature. 217 (5129): 624–626. Bibcode:1968Natur.217..624K. doi:10.1038/217624a0. PMID 5637732.
  25. ^ King, J.L. & Jukes, T.H. (1969). "Non-Darwinian Evolution" (PDF). Science. 164 (3881): 788–798. Bibcode:1969Sci...164..788L. doi:10.1126/science.164.3881.788. PMID 5767777.
  26. ^ a b c d Akashi, H. "Weak Selection and Protein Evolution". Genetics. 192 (1): 15–31. doi:10.1534/genetics.112.140178. PMC 3430532.
  27. ^ a b c d e f Fay, JC, Wu, CI (2003). "Sequence divergence, functional constraint, and selection in protein evolution". Annu. Rev. Genom. Hum. Genet. 4: 213–35. doi:10.1146/annurev.genom.4.020303.162528.
  28. ^ Nachman M. (2006). C.W. Fox; J.B. Wolf (eds.). ""Detecting selection at the molecular level" in: Evolutionary Genetics: concepts and case studies": 103–118.
  29. ^ The nearly neutral theory expanded the neutralist perspective, suggesting that several mutations are nearly neutral, which means both random drift and natural selection is relevant to their dynamics.
  30. ^ a b c d Ohta, T (1992). "The Nearly Neutral Theory of Molecular Evolution". Annual Review of Ecology and Systematics. 23 (1): 263–286. doi:10.1146/ ISSN 0066-4162.
  31. ^ Nei, M. (2005). "Selectionism and Neutralism in Molecular Evolution". Molecular Biology and Evolution. 22 (12): 2318–2342. doi:10.1093/molbev/msi242. PMC 1513187. PMID 16120807.
  32. ^ Sueoka, N. (1964). "On the evolution of informational macromolecules". In Bryson, V.; Vogel, H.J. (eds.). Evolving genes and proteins. Academic Press, New-York. pp. 479–496.
  33. ^ Hanukoglu I (2017). "ASIC and ENaC type sodium channels: Conformational states and the structures of the ion selectivity filters". FEBS Journal. 284 (4): 525–545. doi:10.1111/febs.13840. PMID 27580245.
  34. ^ Hanukoglu I, Hanukoglu A (Jan 2016). "Epithelial sodium channel (ENaC) family: Phylogeny, structure-function, tissue distribution, and associated inherited diseases". Gene. 579 (2): 95–132. doi:10.1016/j.gene.2015.12.061. PMC 4756657. PMID 26772908.
  35. ^ Lahr, D. J.; Laughinghouse, H. D.; Oliverio, A. M.; Gao, F.; Katz, L. A. (2014). "How discordant morphological and molecular evolution among microorganisms can revise our notions of biodiversity on Earth". BioEssays. 36 (10): 950–959. doi:10.1002/bies.201400056. PMC 4288574. PMID 25156897.

Further reading

  • Li, W.-H. (2006). Molecular Evolution. Sinauer. ISBN 0-87893-480-4.
  • Lynch, M. (2007). The Origins of Genome Architecture. Sinauer. ISBN 0-87893-484-7.
  • A. Meyer (Editor), Y. van de Peer, "Genome Evolution: Gene and Genome Duplications and the Origin of Novel Gene Functions", 2003, ISBN 978-1-4020-1021-7
  • T. Ryan Gregory, "The Evolution of the Genome", 2004, YSBN 978-0123014634
Directed evolution

Directed evolution (DE) is a method used in protein engineering that mimics the process of natural selection to steer proteins or nucleic acids toward a user-defined goal. It consists of subjecting a gene to iterative rounds of mutagenesis (creating a library of variants), selection (expressing those variants and isolating members with the desired function) and amplification (generating a template for the next round). It can be performed in vivo (in living organisms), or in vitro (in cells or free in solution). Directed evolution is used both for protein engineering as an alternative to rationally designing modified proteins, as well as studies of fundamental evolutionary principles in a controlled, laboratory environment.

Emile Zuckerkandl

Émile Zuckerkandl (July 4, 1922 – November 9, 2013) was an Austrian-born French biologist considered one of the founders of the field of molecular evolution. He is best known for introducing, with Linus Pauling, the concept of the "molecular clock", which enabled the neutral theory of molecular evolution.

Evolutionary biology

Evolutionary biology is the subfield of biology that studies the evolutionary processes that produced the diversity of life on Earth, starting from a single common ancestor. These processes include natural selection, common descent, and speciation.

The discipline emerged through what Julian Huxley called the modern synthesis (of the 1930s) of understanding from several previously unrelated fields of biological research, including genetics, ecology, systematics, and paleontology.

Current research has widened to cover the genetic architecture of adaptation, molecular evolution, and the different forces that contribute to evolution including sexual selection, genetic drift and biogeography. The newer field of evolutionary developmental biology ("evo-devo") investigates how embryonic development is controlled, thus creating a wider synthesis that integrates developmental biology with the fields covered by the earlier evolutionary synthesis.

Evolutionary physiology

Evolutionary physiology is the study of physiological evolution, which is to say, the manner in which the functional characteristics of individuals in a population of organisms have responded to selection across multiple generations during the history of the population.It is a subdiscipline of both physiology and evolutionary biology. Practitioners in this field come from a variety of backgrounds, including physiology, evolutionary biology, ecology and genetics.

Accordingly, the range of phenotypes studied by evolutionary physiologists is broad, including life history, behavior, whole-organism performance, functional morphology, biomechanics, anatomy, classical physiology, endocrinology, biochemistry, and molecular evolution. It is closely related to comparative physiology and environmental physiology, and its findings are a major concern of evolutionary medicine. One definition that has been offered is "the study of the physiological basis of fitness, namely, correlated evolution (including constraints and trade-offs) of physiological form and function associated with the environment, diet, homeostasis, energy management, longevity, and mortality and life history characteristics".

Gene duplication

Gene duplication (or chromosomal duplication or gene amplification) is a major mechanism through which new genetic material is generated during molecular evolution. It can be defined as any duplication of a region of DNA that contains a gene. Gene duplications can arise as products of several types of errors in DNA replication and repair machinery as well as through fortuitous capture by selfish genetic elements. Common sources of gene duplications include ectopic recombination, retrotransposition event, aneuploidy, polyploidy, and replication slippage.

History of molecular evolution

The history of molecular evolution starts in the early 20th century with "comparative biochemistry", but the field of molecular evolution came into its own in the 1960s and 1970s, following the rise of molecular biology. The advent of protein sequencing allowed molecular biologists to create phylogenies based on sequence comparison, and to use the differences between homologous sequences as a molecular clock to estimate the time since the last common ancestor. In the late 1960s, the neutral theory of molecular evolution provided a theoretical basis for the molecular clock, though both the clock and the neutral theory were controversial, since most evolutionary biologists held strongly to panselectionism, with natural selection as the only important cause of evolutionary change. After the 1970s, nucleic acid sequencing allowed molecular evolution to reach beyond proteins to highly conserved ribosomal RNA sequences, the foundation of a reconceptualization of the early history of life.

John H. Gillespie

John H. Gillespie is an evolutionary biologist interested in theoretical population genetics and molecular evolution. In molecular evolution, he emphasized the importance of advantageous mutations and balancing selection. For that reason, Gillespie is well known for his selectionist stance in the neutralist-selectionist debate. He is widely considered the main proponent of natural selection in molecular evolution. He had a well-known feud with the father of the neutral theory of molecular evolution, Motoo Kimura, initiated by a review in Science of Kimura's book in which Gillespie criticized Kimura for "using the book as a vehicle to establish for himself a niche in the history of science." Gillespie had only four PhD students during his career, Richard Hudson, James N. McNair, David Cutler, and Andrew Kern. Gillespie was a professor in the College of Biological Sciences at the University of California, Davis until his retirement in 2005.

Journal of Molecular Evolution

The Journal of Molecular Evolution is a monthly peer-reviewed scientific journal that covers molecular evolution. It is published by Springer Science+Business Media and was established in 1971. The founding editor was Emile Zuckerkandl, who remained editor-in-chief until the late 1990s. In 1994, the journal became associated with the then existent International Society of Molecular Evolution.


MEDLINE (Medical Literature Analysis and Retrieval System Online, or MEDLARS Online) is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care. MEDLINE also covers much of the literature in biology and biochemistry, as well as fields such as molecular evolution.

Compiled by the United States National Library of Medicine (NLM), MEDLINE is freely available on the Internet and searchable via PubMed and NLM's National Center for Biotechnology Information's Entrez system.

Molecular clock

The molecular clock is figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA or amino acid sequences for proteins. The benchmarks for determining the mutation rate are often fossil or archaeological dates. The molecular clock was first tested in 1962 on the hemoglobin protein variants of various animals, and is commonly used in molecular evolution to estimate times of speciation or radiation. It is sometimes called a gene clock or an evolutionary clock.

Molecular phylogenetics

Molecular phylogenetics () is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominately in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.Molecular phylogenetics and molecular evolution correlate. Molecular evolution is the process of selective changes (mutations) at a molecular level (genes, proteins, etc.) throughout various branches in the tree of life (evolution). Molecular phylogenetics makes inferences of the evolutionary relationships that arise due to molecular evolution and results in the construction of a phylogenetic tree. The figure displayed on the right depicts the phylogenetic tree of life as one of the first detailed trees, according to information known in the 1870s by Haeckel.

Motoo Kimura

Motoo Kimura (木村 資生, Kimura Motō) (November 13, 1924 – November 13, 1994) was a Japanese biologist best known for introducing the neutral theory of molecular evolution in 1968. He became one of the most influential theoretical population geneticists. He is remembered in genetics for his innovative use of diffusion equations to calculate the probability of fixation of beneficial, deleterious, or neutral alleles. Combining theoretical population genetics with molecular evolution data, he also developed the neutral theory of molecular evolution in which genetic drift is the main force changing allele frequencies. James F. Crow, himself a renowned population geneticist, considered Kimura to be one of the two greatest evolutionary geneticists, along with Gustave Malécot, after the great trio of the modern synthesis, Ronald Fisher, J. B. S. Haldane and Sewall Wright.

Nearly neutral theory of molecular evolution

The nearly neutral theory of molecular evolution is a modification of the neutral theory of molecular evolution that accounts for the fact that not all mutations are either so deleterious such that they can be ignored, or else neutral. Slightly deleterious mutations are reliably purged only when their selection coefficient are greater than one divided by the effective population size. In larger populations, a higher proportion of mutations exceed this threshold for which genetic drift cannot overpower selection, leading to fewer fixation events and so slower molecular evolution.

The nearly neutral theory was proposed by Tomoko Ohta in 1973. The population-size-dependent threshold for purging mutations has been called the "drift barrier" by Michael Lynch, and used to explain differences in genomic architecture among species.

Neutral mutation

Neutral mutations are changes in DNA sequence that are neither beneficial nor detrimental to the ability of an organism to survive and reproduce. In population genetics, mutations in which natural selection does not affect the spread of the mutation in a species are termed neutral mutations. Neutral mutations that are inheritable and not linked to any genes under selection will either be lost or will replace all other alleles of the gene. This loss or fixation of the gene proceeds based on random sampling known as genetic drift. A neutral mutation that is in linkage disequilibrium with other alleles that are under selection may proceed to loss or fixation via genetic hitchhiking and/or background selection.

While many mutations in a genome may decrease an organism’s ability to survive and reproduce, also known as fitness, these mutations are selected against and not passed on to future generations. The most commonly observed mutations detectable as variation in the genetic makeup of organisms and populations appear to have no visible effect on the fitness of individuals and are therefore neutral. The identification and study of neutral mutations has led to the development of the neutral theory of molecular evolution. The neutral theory of molecular evolution is an important and often controversial theory proposing that most molecular variation within and among species is essentially neutral and not acted on by selection. Neutral mutations are also the basis for using molecular clocks to identify such evolutionary events as speciation and adaptive or evolutionary radiations.

Neutral theory of molecular evolution

The neutral theory of molecular evolution holds that most evolutionary changes at the molecular level, and most of the variation within and between species, are due to random genetic drift of mutant alleles that are selectively neutral. The theory applies only for evolution at the molecular level, and is compatible with phenotypic evolution being shaped by natural selection as postulated by Charles Darwin. The neutral theory allows for the possibility that most mutations are deleterious, but holds that because these are rapidly removed by natural selection, they do not make significant contributions to variation within and between species at the molecular level. A neutral mutation is one that does not affect an organism's ability to survive and reproduce. The neutral theory assumes that most mutations that are not deleterious are neutral rather than beneficial. Because only a fraction of gametes are sampled in each generation of a species, the neutral theory suggests that a mutant allele can arise within a population and reach fixation by chance, rather than by selective advantage.The theory was introduced by the Japanese biologist Motoo Kimura in 1968, and independently by two American biologists Jack Lester King and Thomas Hughes Jukes in 1969, and described in detail by Kimura in his 1983 monograph The Neutral Theory of Molecular Evolution. The proposal of the neutral theory was followed by an extensive "neutralist-selectionist" controversy over the interpretation of patterns of molecular divergence and polymorphism, peaking in the 1970s and 1980s.


Paleobiology (UK & Canadian English: palaeobiology) is a growing and comparatively new discipline which combines the methods and findings of the life science biology with the methods and findings of the earth science paleontology. It is occasionally referred to as "geobiology".

Paleobiological research uses biological field research of current biota and of fossils millions of years old to answer questions about the molecular evolution and the evolutionary history of life. In this scientific quest, macrofossils, microfossils and trace fossils are typically analyzed. However, the 21st-century biochemical analysis of DNA and RNA samples offers much promise, as does the biometric construction of phylogenetic trees.

An investigator in this field is known as a paleobiologist.

Protein superfamily

A protein superfamily is the largest grouping (clade) of proteins for which common ancestry can be inferred (see homology). Usually this common ancestry is inferred from structural alignment and mechanistic similarity, even if no sequence similarity is evident. Sequence homology can then be deduced even if not apparent (due to low sequence similarity). Superfamilies typically contain several protein families which show sequence similarity within each family. The term protein clan is commonly used for protease and glycosyl hydrolases superfamilies based on the MEROPS and CAZy classification systems.

Thomas H. Jukes

Thomas Hughes Jukes (August 26, 1906 – November 1, 1999) was a British-American biologist known for his work in nutrition, molecular evolution, and for his public engagement with controversial scientific issues, including DDT, vitamin C and creationism. He was the co-author, with Jack Lester King, of the 1969 Science article "Non-Darwinian Evolution" which, along with Motoo Kimura's earlier publication, was the origin of the neutral theory of molecular evolution.

Jukes was born in Hastings, England, but moved to Toronto in 1924. In 1933, he earned a Ph.D. in biochemistry from the University of Toronto. He spent the next decade in the University of California system, first as a postdoctoral fellow at UC Berkeley, then as Instructor and Assistant Professor at UC Davis. At Davis, he helped determine the relationships among the B complex vitamins through experiments on chickens. He then left academia to work for American Cyanamid's Lederle Laboratories, where he helped established that folic acid is a vitamin and discovered that feeding livestock a continual supply of antibiotics significantly enhances growth (a practice that has become widespread in the meat industry).Following the rise of molecular biology, Jukes returned to UC Berkeley, where he spent the rest of his career. Independently of Motoo Kimura, Jukes (with Jack King) proposed in 1969 that the evolution of proteins is primarily driven by genetic drift acting on mutations that are neither beneficial nor deleterious—the neutral theory of molecular evolution. Despite the provocative paper ("Non-Darwinian Evolution"), he was not a prominent participant in the ensuing "neutralist-selectionist debate"; defense of the neutral theory was primarily left to others, especially Kimura. In 1971, Jukes was one of the founders of the Journal of Molecular Evolution; his subsequent work with molecular evolution focused especially on the origin and evolution of the genetic code.After returning to Berkeley, he also became heavily involved in a number of public scientific controversies, and was a gifted polemicist. In the 1960s, he fought against the introduction of creationism into the California public schools. Following the rise of the environmental movement, he fought against DDT bans, citing lack of evidence for detrimental effects to ecosystems. Between 1975 and 1980 he was one of the only scientists ever to have a regular column in the journal Nature, which he used to denounce a variety of what he considered pseudoscience, expressing "his deep suspicion that categorical statements of scientific 'fact' are usually exaggerations." He was one of the most prominent critics of Linus Pauling's claims about the benefits of vitamin C megadosage, and a frequent critic of other nutrition-based health and treatment claims, such as for homeopathy and the supposed cancer cure Laetrile.Thomas Jukes died of pneumonia, leaving his wife Marguerite, two daughters, one daughter-in-law, and seven grandchildren.

Ziheng Yang

Ziheng Yang FRS (Chinese: 杨子恒; born 1 November 1964) is a Chinese biologist. He holds the R.A. Fisher Chair of Statistical Genetics at University College London, and is the Director of R.A. Fisher Centre for Computational Biology at UCL. He was elected a Fellow of the Royal Society in 2006.

Population genetics
Of taxa
Of organs
Of processes
Tempo and modes


This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.