Overview

Genome comparison is one of the excellent ways to interpret the evolutionary relationships between organisms. The basic principle of genome comparison is that if two species share a common feature, it is likely encoded by the DNA sequence conserved between both species. The advent of genome sequencing technologies in the late 20th century enabled scientists to understand the concept of conservation of domains between species and helped them to deduce evolutionary relationships across diverse organisms.

Genome comparison can reveal three levels of evolutionary relationships. The first level provides deep insight into sequences and protein domains that are conserved across diverse groups of organisms, such as humans and fishes. The second level increases the resolution further to identify the unique DNA elements present in the closely related species, such as humans and chimpanzees. The third level with even higher data resolution distinguishes the genetic differences within a species, such as different variants and subtypes of an organism. This high-level resolution may identify mutations particular to individual microbial strains or clusters of infected cases, helping to track the disease outbreaks.

DNA sequencing tools

Several methods can be used to obtain the DNA sequence data required to deduce evolutionary relationships. Among them, whole-genome sequencing or WGS is a widely used technique. It provides high-resolution data extremely helpful to analyze mutations and conserved sequences among several organisms. It can also identify the cause of genetic disorders by comparing the DNA sequence of affected individuals to those of other unaffected subjects.

Data analysis tools

The data obtained by WGS or similar sequencing methods is analyzed by appropriate software tools to deduce evolutionary relationships. Molecular Evolutionary Genetics Analysis (MEGA) is one of the most widely used software tools. The programs present in the MEGA, such as assembly sequence alignment, building evolutionary trees, estimating genetic distances, and computation of evolutionary time trees, allow the users to curate and interpret the raw data obtained from sequencing techniques.

Procedure

Linnaeus's traditional classification of organisms is called cladistics, which is based on the differences in organism's physical characteristics, and scientists have commonly constructed trees, called dendrograms, to give visual representations of these splits and groups.

However, with the advent of modern technology, comparing DNA has become a common way to build such trees. If sequence data is examined across a single species, like humans, there is a very high degree of similarity in the genetic code, around 99.9%  because the genetic code of an organism is passed from parent to offspring.

Humans also share much of this DNA code with other species, like chimpanzees and mice, but the degree of overall similarity between human DNA and theirs is significantly different. This means that trees can be created for groups of species based on the similarities or differences between their genetic codes. This field of analysis, combining statistics, mathematical modeling, and computer science, is part of a field known as bioinformatics.

The genetic data used to create these trees can take many forms. For example, in molecular phylogeny one or two key genetic loci are sequenced and then compared across the species of interest.

However, as individual genes or genetic regions may evolve at vastly different rates in different species or even be exchanged between different species through horizontal gene transfer, these small-scale genetic surveys may not always provide accurate phylogenies.

In bacterial phylogenies, a technique called multi-locus sequence typing, or MLST, is often used. This method generates sequences across multiple genetic regions - typically housekeeping genes which are essential to cellular function and so are conserved across species.

However, the housekeeping genes may evolve slowly, hence, with MLST it is difficult to obtain strain-level resolution.

Finally, whole-genome sequencing, or WGS, can be used to elucidate evolutionary relationships. This method involves the sequencing of the complete genome of an organism, including mitochondrial DNA in eukaryotes, and even chloroplast DNA in plants.

WGS aligns the whole genomes at fine-scale resolution and can identify mutations or species specific markers, branching points, and even strains or populations of a single species. Such fine details may be missed in more targeted sequencing.