Overview

The seminal work of Ohno in 1970 popularized the idea of gene duplication and divergence. DNA sequence comparison studies reveal that a large portion of the genes in bacteria, archaebacteria, and eukaryotes was  generated by gene duplication and divergence, indicating its critical role in evolution.

The duplicated copies of the gene are called Paralogs. Paralogs with similar sequences and functions form a gene family. Across several species, a large number of gene families are characterized. For example, the trypsin gene family in D. melanogaster has over 111 members; the olfactory receptor gene family in mammals has around 1000 member genes.

Generation of Duplicate genes

Gene duplication can arise due to the following four reasons. First, the unequal crossing over during meiosis can give rise to duplicated DNA segments containing a part of a gene or several genes.

The second is replication slippage. In rare instances, during DNA replication, the polymerase enzyme can dissociate from DNA and get realigned at an incorrect position, and copy the already replicated sequences again. This process can create duplicate copies of the DNA over several hundreds of bases.

The third is the retrotransposition. Here, cellular mRNA may get reverse transcribed into DNA copies called retrogenes. These retrogenes can then insert themselves back into the genome resulting in gene duplication. Since the inserted copy lacks promoters and other regulatory elements for transcription, most of these duplicates lose their function and become pseudogenes.

In addition to gene duplications, large-scale chromosome duplications or whole-genome duplications also occur. Some chromosomes may fail to segregate into daughter cells during meiosis, resulting in haploid cells with an abnormal number of chromosomes. For example, patients with Down syndrome have an additional copy of chromosome 21. In plants such as wheat, the entire genome is duplicated over six times, creating a hexaploid.

Procedure

Gene duplication is a process where a DNA region coding for a gene duplicates, making additional copies of itself within the same genome. These duplicated copies of the gene - called paralogs can later mutate and diverge in one of the following ways.

The first is formation of the pseudogenes. Here, one of the gene paralogs may acquire deleterious mutations and turn into a nonfunctional copy called a pseudogene.

The second is sub-functionalization where both the paralogs acquire mutations in different protein coding domains or exons, thus partitioning the original gene function between them. However, the protein products of the two paralogous genes complement each other and exhibit the original gene function.

For example, in primitive fish and marine animals, a single chain globin protein served as the oxygen carrying molecule in the blood.

During the course of evolution, the globin gene duplicated and sub functionalized into two slightly different genes coding for α- and β-globin proteins, that associate to form the hemoglobin molecule with 4 subunits found in most present day vertebrates.

The third is Neo-functionalization. Here, one paralog acquires novel, advantageous mutations that can lead to the evolution of a new gene. In contrast, the other paralog retains the original function.

For example, the human β-globin gene duplicated and acquired mutations to produce a new gene called fetal β-globin that is expressed exclusively in the human fetus. However, soon after birth, the β-globin gene takes over production of the β-globin proteins.

The evolution of tricolor vision in humans is another interesting example of neofunctionalization. Much before the evolution of modern apes, the early primates had dichromatic vision due to the presence of the Blue and Green opsin genes.

Later on, the Green opsin gene duplicated and neo-functionalized into a novel red opsin gene.

Therefore, the species which evolved after the duplication event, such as the old world monkeys, apes, and humans have three opsin genes which impart tricolor vision.