Human Genome Project
The Human Genome Project (HGP) is a scientific effort to explore the DNA of humans and
various other lower organisms. It was a hallmark in the field of Genomics and Bioinformatics.
The National Institute of Health (NIH) of the US Department of Energy sponsored this project,
which was launched on October 1, 1990, and was finished in April 2003. The project's objective
is to locate every human gene on the chromosome and to discover the exact chemical makeup of
each gene to understand how it affects both health and sickness. Another objective of the HGP
was to create comprehensive maps of the human genome. These maps will aid researchers in
their search for genes in the human genome. There will be two sorts of maps produced: genetic
linkage maps and physical maps. Genetic linkage maps determine the relative arrangement and
approximative distances between genes and markers on the chromosomes, as opposed to
physical maps, which determine the physical location and distance between genes or DNA
fragments with unknown functions that are mapped to specific regions of the chromosome.
The importance is also given to the sequencing of other model organisms in the Human
Genome Project. Caenorhabditis elegans, a nematode worm, and Saccharomyces cerevisiae, a
yeast, DNA were sequenced in 1996; Escherichia coli, a bacterium, in 1997; Drosophila
melanogaster, the fruit fly, and Arabidopsis thaliana, a plant, in 2000; and Mus musculus, a
laboratory mouse, and Staphylococcus aureus in 2001. The reasoning for these discoveries is that
many genes with comparable roles in different animals have been conserved during evolution
and exhibit striking similarities. Furthermore, following the completion of HGP in 2003, further
sequencing was performed to fix gaps and decrease ambiguities to create a high-quality reference
sequence.
Quote about the Data:
“Mutation changes life either positively or negatively”
Explanation of the data:
The data contains SNPs details of all chromosomes. There are 22 autosomes and One Sex
Chromosome (X comes from the mother and Y comes from the father). Besides these 23
chromosomes, there is Mitochondrial DNA which is the circular chromosome found inside the
mitochondria. There is a total of 705195 SNPs in data that were mapped to hg build 37 (also
known as Annotation Release 104). Some SNPs are synonymous (no change in nucleotide) and
others are non-synonymous (one nucleotide changes into another). The data contains rsid in
the first column, chromosome number in the 2nd column, the position of SNPs at chromosome
in the 3rd column and genotype in the 4th column as shown in figure 1. RSIDS are identifiers for
specific alleles. Genotype indicates alleles that is pair of contrasting character. Human genomes
have two copies, one from mom one from dad. As shown in figure 1, any genotype other than
rs4970383 is homozygous meaning person got the same alleles from both of their parents. On
the other hand, rs4970383 is heterozygous meaning person got two distinct alleles from their
parents
In his marvelous book, Genome, Matt Ridley wrote:
“Imagine that the genome is a book. There are 23 chapters, called chromosomes. Each chapter
contains several thousand stories, called genes. Each story is made up of paragraphs called
exons, which are interrupted by advertisements called introns. Each paragraph is made up of
words called codons. Each word written in letters are called bases, which are Cytosine, Guanine,
Adenine, Thiamine or shortly A,G,T,C.”
References:
1) http://docplayer.net/41630972-Human-genome-project-and-its-ethical-issues.html
2) https://en.wikipedia.org/wiki/Single-nucleotide_polymorphism