Molecular phylogenetics

Molecular phylogenetics (/məˈlɛkjʊlər ˌfaɪloʊdʒəˈnɛtɪks, mɒ-, moʊ-/[1][2]) is the branch of phylogeny that analyzes genetic, hereditary molecular differences, predominately in DNA sequences, to gain information on an organism's evolutionary relationships. From these analyses, it is possible to determine the processes by which diversity among species has been achieved. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree. Molecular phylogenetics is one aspect of molecular systematics, a broader term that also includes the use of molecular data in taxonomy and biogeography.[3][4][5]

Molecular phylogenetics and molecular evolution correlate. Molecular evolution is the process of selective changes (mutations) at a molecular level (genes, proteins, etc.) throughout various branches in the tree of life (evolution). Molecular phylogenetics makes inferences of the evolutionary relationships that arise due to molecular evolution and results in the construction of a phylogenetic tree. The figure displayed on the right depicts the phylogenetic tree of life as one of the first detailed trees, according to information known in the 1870s by Haeckel.[6]

Tree of life by Haeckel
Phylogenetic Tree of Life by Haeckel


The theoretical frameworks for molecular systematics were laid in the 1960s in the works of Emile Zuckerkandl, Emanuel Margoliash, Linus Pauling, and Walter M. Fitch.[7] Applications of molecular systematics were pioneered by Charles G. Sibley (birds), Herbert C. Dessauer (herpetology), and Morris Goodman (primates), followed by Allan C. Wilson, Robert K. Selander, and John C. Avise (who studied various groups). Work with protein electrophoresis began around 1956. Although the results were not quantitative and did not initially improve on morphological classification, they provided tantalizing hints that long-held notions of the classifications of birds, for example, needed substantial revision. In the period of 1974–1986, DNA-DNA hybridization was the dominant technique used to measure genetic difference.[8]

Theoretical background

Early attempts at molecular systematics were also termed as chemotaxonomy and made use of proteins, enzymes, carbohydrates, and other molecules that were separated and characterized using techniques such as chromatography. These have been replaced in recent times largely by DNA sequencing, which produces the exact sequences of nucleotides or bases in either DNA or RNA segments extracted using different techniques. In general, these are considered superior for evolutionary studies, since the actions of evolution are ultimately reflected in the genetic sequences. At present, it is still a long and expensive process to sequence the entire DNA of an organism (its genome). However, it is quite feasible to determine the sequence of a defined area of a particular chromosome. Typical molecular systematic analyses require the sequencing of around 1000 base pairs. At any location within such a sequence, the bases found in a given position may vary between organisms. The particular sequence found in a given organism is referred to as its haplotype. In principle, since there are four base types, with 1000 base pairs, we could have 41000 distinct haplotypes. However, for organisms within a particular species or in a group of related species, it has been found empirically that only a minority of sites show any variation at all, and most of the variations that are found are correlated, so that the number of distinct haplotypes that are found is relatively small.[9]

Clade in Phylogenetic Tree
In a phylogenetic tree, numerous groupings (clades) exist. A clade may be defined as a group of organisms having a common ancestor throughout evolution. This figure illustrates how a clade in a phylogenetic tree may be expressed.

In a molecular systematic analysis, the haplotypes are determined for a defined area of genetic material; a substantial sample of individuals of the target species or other taxon is used; however, many current studies are based on single individuals. Haplotypes of individuals of closely related, yet different, taxa are also determined. Finally, haplotypes from a smaller number of individuals from a definitely different taxon are determined: these are referred to as an outgroup. The base sequences for the haplotypes are then compared. In the simplest case, the difference between two haplotypes is assessed by counting the number of locations where they have different bases: this is referred to as the number of substitutions (other kinds of differences between haplotypes can also occur, for example, the insertion of a section of nucleic acid in one haplotype that is not present in another). The difference between organisms is usually re-expressed as a percentage divergence, by dividing the number of substitutions by the number of base pairs analysed: the hope is that this measure will be independent of the location and length of the section of DNA that is sequenced.

An older and superseded approach was to determine the divergences between the genotypes of individuals by DNA-DNA hybridization. The advantage claimed for using hybridization rather than gene sequencing was that it was based on the entire genotype, rather than on particular sections of DNA. Modern sequence comparison techniques overcome this objection by the use of multiple sequences.

Once the divergences between all pairs of samples have been determined, the resulting triangular matrix of differences is submitted to some form of statistical cluster analysis, and the resulting dendrogram is examined in order to see whether the samples cluster in the way that would be expected from current ideas about the taxonomy of the group. Any group of haplotypes that are all more similar to one another than any of them is to any other haplotype may be said to constitute a clade, which may be visually represented as the figure displayed on the right demonstrates. Statistical techniques such as bootstrapping and jackknifing help in providing reliability estimates for the positions of haplotypes within the evolutionary trees.

Techniques and applications

Every living organism contains deoxyribonucleic acid (DNA), ribonucleic acid (RNA), and proteins. In general, closely related organisms have a high degree of similarity in the molecular structure of these substances, while the molecules of organisms distantly related often show a pattern of dissimilarity. Conserved sequences, such as mitochondrial DNA, are expected to accumulate mutations over time, and assuming a constant rate of mutation, provide a molecular clock for dating divergence. Molecular phylogeny uses such data to build a "relationship tree" that shows the probable evolution of various organisms. With the invention of Sanger sequencing in 1977, it became possible to isolate and identify these molecular structures.[10][11] High-throughput sequencing may also be used to obtain the transcriptome of an organism, allowing inference of phylogenetic relationships using transcriptomic data.

The most common approach is the comparison of homologous sequences for genes using sequence alignment techniques to identify similarity. Another application of molecular phylogeny is in DNA barcoding, wherein the species of an individual organism is identified using small sections of mitochondrial DNA or chloroplast DNA. Another application of the techniques that make this possible can be seen in the very limited field of human genetics, such as the ever-more-popular use of genetic testing to determine a child's paternity, as well as the emergence of a new branch of criminal forensics focused on evidence known as genetic fingerprinting.

Molecular phylogenetic analysis

There are several methods available for performing a molecular phylogenetic analysis. One method, including a comprehensive step-by-step protocol on constructing a phylogenetic tree, including DNA/Amino Acid contiguous sequence assembly, multiple sequence alignment, model-test (testing best-fitting substitution models), and phylogeny reconstruction using Maximum Likelihood and Bayesian Inference, is available at Nature Protocol.[12]

Another molecular phylogenetic analysis technique has been described by Pevsner and shall be summarized in the sentences to follow (Pevsner, 2015). A phylogenetic analysis typically consists of five major steps. The first stage comprises sequence acquisition. The following step consists of performing a multiple sequence alignment, which is the fundamental basis of constructing a phylogenetic tree. The third stage includes different models of DNA and amino acid substitution. Several models of substitution exist. A few examples include Hamming distance, the Jukes and Cantor one-parameter model, and the Kimura two-parameter model (see Models of DNA evolution). The fourth stage consists of various methods of tree building, including distance-based and character-based methods. The normalized Hamming distance and the Jukes-Cantor correction formulas provide the degree of divergence and the probability that a nucleotide changes to another, respectively. Common tree-building methods include unweighted pair group method using arithmetic mean (UPGMA) and Neighbor joining, which are distance-based methods, Maximum parsimony, which is a character-based method, and Maximum likelihood estimation and Bayesian inference, which are character-based/model-based methods. UPGMA is a simple method; however, it is less accurate than the neighbor-joining approach. Finally, the last step comprises evaluating the trees. This assessment of accuracy is composed of consistency, efficiency, and robustness.[13]

Five Stages of Molecular Phylogenetic Analysis
Five Stages of Molecular Phylogenetic Analysis

MEGA (molecular evolutionary genetics analysis) is an analysis software that is user-friendly and free to download and use. This software is capable of analyzing both distance-based and character-based tree methodologies. MEGA also contains several options one may choose to utilize, such as heuristic approaches and bootstrapping. Bootstrapping is an approach that is commonly used to measure the robustness of topology in a phylogenetic tree, which demonstrates the percentage each clade is supported after numerous replicates. In general, a value greater than 70% is considered significant. The flow chart displayed on the right visually demonstrates the order of the five stages of Pevsner's molecular phylogenetic analysis technique that have been described.[13]


Molecular systematics is an essentially cladistic approach: it assumes that classification must correspond to phylogenetic descent, and that all valid taxa must be monophyletic. This is a limitation when attempting to determine the optimal tree(s), which often involves bisecting and reconnecting portions of the phylogenetic tree(s).

The recent discovery of extensive horizontal gene transfer among organisms provides a significant complication to molecular systematics, indicating that different genes within the same organism can have different phylogenies.

In addition, molecular phylogenies are sensitive to the assumptions and models that go into making them. They face issues such as long-branch attraction, saturation, and taxon sampling problems: This means that strikingly different results can be obtained by applying different models to the same dataset.[14]

Moreover, as previously mentioned, UPGMA is a simple approach in which the tree is always rooted. The algorithm assumes a constant molecular clock for sequences in the tree. This is associated with being a limitation in that if unequal substitution rates exist, the result may be an incorrect tree.[13]

See also

Notes and references

  1. ^ Jones, Daniel (2003) [1917], Peter Roach; James Hartmann; Jane Setter (eds.), English Pronouncing Dictionary, Cambridge: Cambridge University Press, ISBN 3-12-539683-2
  2. ^ "Phylogenetic". Merriam-Webster Dictionary.
  3. ^ Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates Incorporated. ISBN 0-87893-177-5.
  4. ^ Soltis, P.S., Soltis, D.E., and Doyle, J.J. (1992) Molecular systematics of plants. Chapman & Hall, New York. ISBN 0-41202-231-1.
  5. ^ Soltis, P.S., Soltis, D.E., and Doyle, J.J. (1998) Molecular Systematics of Plants II: DNA Sequencing. Kluwer Academic Publishers Boston, Dordrecht, London. ISBN 0-41211-131-4.
  6. ^ Hillis, D. M. & Moritz, C. 1996. Molecular systematics. 2nd ed. Sinauer Associates Incorporated. ISBN 0-87893-282-8.
  7. ^ Suárez-Díaz, Edna & Anaya-Muñoz, Victor H. (2008). "History, objectivity, and the construction of molecular phylogenies". Stud. Hist. Phil. Biol. & Biomed. Sci. 39 (4): 451–468. doi:10.1016/j.shpsc.2008.09.002. PMID 19026976.
  8. ^ Ahlquist, Jon E. (1999). "Charles G. Sibley: A commentary on 30 years of collaboration". The Auk. 116 (3): 856–860. doi:10.2307/4089352. JSTOR 4089352.
  9. ^ Page, Roderic D. M.; Holmes, Edward C. (1998). Molecular evolution : a phylogenetic approach. Oxford: Blackwell Science. ISBN 9780865428898. OCLC 47011609.
  10. ^ Sanger F, Coulson AR (May 1975). "A rapid method for determining sequences in DNA by primed synthesis with DNA polymerase". J. Mol. Biol. 94 (3): 441–8. doi:10.1016/0022-2836(75)90213-2. PMID 1100841.
  11. ^ Sanger F, Nicklen S, Coulson AR (December 1977). "DNA sequencing with chain-terminating inhibitors". Proc. Natl. Acad. Sci. U.S.A. 74 (12): 5463–7. Bibcode:1977PNAS...74.5463S. doi:10.1073/pnas.74.12.5463. PMC 431765. PMID 271968.
  12. ^ Bast, F. (2013). "Sequence Similarity Search, Multiple Sequence Alignment, Model Selection, Distance Matrix and Phylogeny Reconstruction". Protoc. Exch. doi:10.1038/protex.2013.065.
  13. ^ a b c Pevsner, J. (2015). "Chapter 7: Molecular Phylogeny and Evolution". Bioinformatics and Functional Genomics (3rd ed.). Wiley-Blackwell. pp. 245–295. ISBN 978-1-118-58178-0.
  14. ^ Philippe, H.; Brinkmann, H.; Lavrov, D. V.; Littlewood, D. T. J.; Manuel, M.; Wörheide, G.; Baurain, D. (2011). Penny, David (ed.). "Resolving Difficult Phylogenetic Questions: Why More Sequences Are Not Enough". PLoS Biology. 9 (3): e1000602. doi:10.1371/journal.pbio.1000602. PMC 3057953. PMID 21423652.

Further reading

  • San Mauro, D.; Agorreta, A. (2010). "Molecular systematics: a synthesis of the common methods and the state of knowledge". Cellular & Molecular Biology Letters. 15 (2): 311–341. doi:10.2478/s11658-010-0010-8.

External links

Circumscription (taxonomy)

In biological taxonomy, circumscription is the definition of a taxon, that is, a group of organisms.

One goal of biological taxonomy is to achieve a stable circumscription for every taxon. Achieving stability is not yet a certainty in most taxa, and many that had been regarded as stable for decades are in upheaval in the light of rapid developments in molecular phylogenetics. In essence, new discoveries may invalidate the application of irrelevant attributes used in established or obsolete circumscriptions, or present new attributes useful in cladistic taxonomy.

An example of a taxonomic group with unstable circumscription is Anacardiaceae, a family of flowering plants. Some experts favor a circumscription in which this family includes the Blepharocaryaceae, Julianaceae, and Podoaceae, which are sometimes considered to be separate families.


The Cisticolidae family of small passerine birds is a group of about 160 warblers found mainly in warmer southern regions of the Old World. They were formerly included within the Old World warbler family Sylviidae.

This family probably originated in Africa, which has the majority of species, but there are representatives of the family across tropical Asia into Australasia, and one species, the zitting cisticola, even breeds in Europe.

These are generally very small birds of drab brown or grey appearance found in open country such as grassland or scrub. They are often difficult to see and many species are similar in appearance, so the song is often the best identification guide. These are insectivorous birds which nest low in vegetation.


Colubridae (, commonly known as colubrids , from Latin coluber, snake) is a family of snakes. With 524 genera and approximately 1,760 species, it is the largest snake family, and includes just over 51% of all known living snake species. The earliest species of the family date back to the Oligocene epoch. Colubrid snakes are found on every continent except Antarctica.


Commelinoideae is a subfamily of monocotyledonous flowering plants in the dayflower family (Commelinaceae). The Commelinoideae is one of two subfamilies within the Commelinaceae and includes 39 genera (out of 41 in the family) and all but 12 of the family's several hundred known species. The subfamily is further broken down into two tribes, the Tradescantieae, which includes 26 genera and about 300 species, and the Commelineae, which contains 13 genera and about 350 species.

The Commelinoideae is separated morphologically from the other subfamily, Cartonematoideae, in having glandular microhairs, arteries containing needle-like calcium oxalate crystals called raphide canals in between the veins of the leaves, and flowers that are virtually never both yellow and actinomorphic. Molecular phylogenetics also supports the separation of the two subfamilies.


Conoidea is a superfamily of predatory sea snails, marine gastropod mollusks within the suborder Hypsogastropoda. This superfamily is a very large group of marine mollusks, estimated at about 340 recent valid genera and subgenera, and considered by one authority to contain 4,000 named living species.This superfamily includes the turrids, the terebras (also known as auger snails or auger shells) and the cones or cone snails. The phylogenetic relationships within this superfamily are poorly established. Several families (especially the Turridae), subfamilies and genera are thought to be polyphyletic.In contrast to Puillandre's estimate, Bandyopadhyay et al. (2008) estimated that the superfamily Conoidea contains about 10,000 species. Tucker (2004) even speaks of 11,350 species in the group of taxa commonly referred to as turrids. 3000 recent taxa are potentially valid species. Little more than half of the known taxa are fossil species. Many species are little known and need more investigation to find their exact systematic place.Most species in this superfamily are small to medium, with shell lengths between 3 mm and 50 mm. They occur in diverse marine habitats from tropical waters to the poles, in shallow or deep waters, and on hard to soft substrates.

The superfamily is known for its toxoglossan radula, which is used to inject powerful neurotoxins into its prey. This makes these species powerful carnivorous predators on annelid, other mollusc and even fish.

Within the superfamily there are four somewhat different varieties of radula. The radula types are as follows:

Type 1 Drilliidae type: five teeth in each row with comb-like lateral teeth and flat-pointed marginal teeth

Type 2 Turridae s.l. type: two or three teeth in a row with the marginal teeth being of the duplex or wishbone form.

Type 3 Pseudomelatomidae type: two or three teeth in a row with curved and solid marginal teeth.

Type 4 hypodermic type: two hollow, enrolled, marginal teeth in each row with an absent or reduced radular membrane.In 2009, a proposed new classification of this superfamily was published by John K. Tucker and Manuel J. Tenorio. In 2011, a new classification of this superfamily was published by Bouchet et al. Both classifications were based upon cladistical analyses and included modern taxonomic molecular phylogeny studies.


A craniate is a member of the Craniata (sometimes called the Craniota), a proposed clade of chordate animals with a skull of hard bone or cartilage. Living representatives are the Myxini (hagfishes), Hyperoartia (including lampreys), and the much more numerous Gnathostomata (jawed vertebrates). Formerly distinct from vertebrates by excluding hagfish, molecular and anatomical research in the 21st century has led to the reinclusion of hagfish, making living craniates synonymous with living vertebrates.

The clade was conceived largely on the basis of the Hyperoartia (lampreys and kin) being more closely related to the Gnathostomata (jawed vertebrates) than the Myxini (hagfishes). This, combined with an apparent lack of vertebral elements within the Myxini, suggested that the Myxini were descended from a more ancient lineage than the vertebrates, and that the skull developed before the vertebral column. The clade was thus composed of the Myxini and the vertebrates, and any extinct chordates with skulls.

However recent studies using molecular phylogenetics has contradicted this view, with evidence that the Cyclostomata (Hyperoartia and Myxini) is monophyletic; this suggests that the Myxini are degenerate vertebrates, and therefore the vertebrates and craniates are cladistically equivalent, at least for the living representatives. The placement of the Myxini within the vertebrates has been further strengthened by recent anatomical analysis, with vestiges of a vertebral column being discovered in the Myxini.


The Diplodactylidae are a family in the suborder Gekkota (geckos), with about 137 species in 25 genera.

These geckos occur in Australia, New Zealand, and New Caledonia. Three diplodactylid genera (Oedura, Rhacodactylus, and Hoplodactylus) have recently been split into multiple new genera In previous classifications, the family Diplodactylidae is equivalent to the subfamily Diplodactylinae.


The Drosophilinae are the largest subfamily in the Drosophilidae. The other subfamily is the Steganinae.

Internal transcribed spacer

Internal transcribed spacer (ITS) is the spacer DNA situated between the small-subunit ribosomal RNA (rRNA) and large-subunit rRNA genes in the chromosome or the corresponding transcribed region in the polycistronic rRNA precursor transcript.

Jon E. Ahlquist

Jon Edward Ahlquist (born 1944) is an American molecular biologist and ornithologist who has specialized in molecular phylogenetics. He has collaborated extensively with Charles Sibley, primarily at Yale University.

By 1987, both Ahlquist and Sibley had left Yale.

In 1988, Ahlquist and Sibley were awarded the Daniel Giraud Elliot Medal by the National Academy of Sciences. In January 1991 (often listed as 1990), Charles Sibley and Ahlquist published Phylogeny and Classification of Birds, which presented a new phylogeny for birds based on DNA-DNA hybridisation techniques, known as the Sibley-Ahlquist taxonomy.

At that time, he was an associate professor of zoology at Ohio University. In 1999, Ahlquist was retired.Ahlquist is now a Young Earth Creationist.

Molecular Phylogenetics and Evolution

Molecular Phylogenetics and Evolution is a peer-reviewed scientific journal of evolutionary biology and phylogenetics. The journal is edited by D.E. Wildman.


The Muroidea are a large superfamily of rodents, including mice, rats, voles, hamsters, gerbils, and many other relatives. They occupy a vast variety of habitats on every continent except Antarctica. Some authorities have placed all members of this group into a single family, Muridae, due to difficulties in determining how the subfamilies are related to one another. The following taxonomy is based on recent well-supported molecular phylogenies.The muroids are classified in six families, 19 subfamilies, around 280 genera, and at least 1750 species.


The Odontophrynidae are a family of frogs from southern and eastern South America. This family was first established in 1969 as the tribe Odontophrynini within the (then) very large family Leptodactylidae. Molecular phylogenetics analyses prompted the move of this group to the Cycloramphidae in 2006, before they became recognized as their own family Odontophrynidae in 2011.


Petrels are tube-nosed seabirds in the bird order Procellariiformes.


Ranunculales is an order of flowering plants. Of necessity it contains the family Ranunculaceae, the buttercup family, because the name of the order is based on the name of a genus in that family. Ranunculales belongs to a paraphyletic group known as the basal eudicots. It is the most basal clade in this group; in other words, it is sister to the remaining eudicots. Widely known members include poppies, barberries, and buttercups.


Tiphioidea is a suggested superfamily of stinging wasps in the order Hymenoptera. There are two families in Tiphioidea, Tiphiidae and Sierolomorphidae.

Recent research in molecular phylogenetics has resulted in the reorganization of the infraorder Aculeata, which now contains eight superfamilies: Apoidea, Chrysidoidea, Formicoidea, Pompiloidea, Scolioidea, Tiphioidea, Thynnoidea, and Vespoidea.


Vampyriscus is a genus of bats in the family Phyllostomidae, the leaf-nosed bats.

There are three species previously included in the genus Vampyressa. The two genera are differentiated by the morphology of their bones and teeth and the pattern of their pelage. Phylogenetic analyses support the separation of the genera. Older sources recognize Vampyriscus as a subgenus of Vampyressa.Species:

Vampyriscus bidens – bidentate yellow-eared bat

Vampyriscus brocki – Brock's yellow-eared bat

Vampyriscus nymphaea – striped yellow-eared bat

Relevant fields
Basic concepts
Inference methods
Current topics
Group traits
Group types
Related topics


This page is based on a Wikipedia article written by authors (here).
Text is available under the CC BY-SA 3.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.