KEMBAR78
Human genome project | PPT
HUMAN GENOME PROJECT MS.RUCHI YADAV LECTURER AMITY INSTITUTE OF BIOTECHNOLOGY AMITY UNIVERSITY LUCKNOW(UP)
HUMAN GENOME PROJECT GENOME SEQUENCING GENOME ASSEMBLY GENOME  ANNOTATION
Human Genome Project Background The idea of sequencing the entire human genome was First proposed in discussions at scientific meetings organized by the  US Department of Energy and others from 1984 to 1986 Recommended a broader programme, to include: The creation of  genetic, physical and sequence maps  of the human genome;  Parallel efforts in key  model organisms such as bacteria, yeast, worms, fies and mice;  Development of technology  in support of these objectives; Research into the  ethical, legal and social issues  raised by human genome research.
HGP BACKGROUND…… Human Genome Organization  (HUGO)  & International Human Genome Sequencing Consortium  (IHGSC)  was founded to provide a forum for  international coordination of genomic research HGP Project is constituted as the  National Human Genome Research Initiative  (NHGRI).   The collaboration was coordinated through periodic international meetings (referred to as  ‘Bermuda meetings’ )  Work was  shared flexibly among the centres , with some groups focusing on particular chromosomes and others contributing in a genome-wide fashion. The second principle was  rapid and unrestricted data release.  The centres adopted a policy that all genomic sequence data should be made publicly available without restriction within  24 hours of assembly  (Bermuda Principle)
Human Genome Project Begun formally in  1990 , the  U.S. Human Genome Project  was a 13-year effort coordinated by the  U.S. Department of Energy and the National Institutes of Health.  The project originally was planned to last 15 years, but rapid technological advances accelerated the completion date to  2003 .  Project goals were to :- Identify  all the approximately  20,000-25,000 genes  in human DNA,  Determine  the sequences of the  3 billion chemical base  pairs that make up human DNA,  Store  this information in  databases,  Improve   tools  for data analysis,  Transfer  related technologies to the private sector, and  Address   the ethical, legal, and social issues  (ELSI)  that may arise from the project.
 
 
Milestones : June 2000 : Completion of a  working draft  of the entire human genome  February 2001 :  Analyses  of the working draft are published April 2003 : HGP sequencing is completed and Project is declared  finished  two years ahead of schedule
Timeline  of large-scale genomic analyses.
HUMAN GENOME The human genome contains  3 billion chemical nucleotide bases  (A, C, T, and G).  The  average gene consists of 3000 bases , but sizes vary greatly, with the largest known human gene being  dystrophin at 2.4 million bases. The  total number of genes is estimated  at around  30,000  much lower than previous estimates of 80,000 to 140,000.   Almost all (99.9%) nucleotide bases are exactly the same in all people.   The functions are unknown for over 50% of discovered genes.
HUMAN GENOME PROJECT PUBLIC AND PRIVATE SECTOR
Two Different Groups Worked to Obtain the DNA Sequence of the Human Genome The US HGP is a multinational consortium established by government research agencies and funded publicly. Celera Genomics  is a private company whose former CEO,  J. Craig Venter  and  Francis collins , ran an independent sequencing project. Differences arose regarding who should receive the credit for this scientific milestone .  June 6, 2000,  the HGP and Celera Genomics  held a joint press conference to announce that  TOGETHER  they had completed ~97% of the human genome.
PUBLISHED The International Human Genome Sequencing Consortium published their results in Nature, 409 (6822): 860-921, 2001. “ Initial Sequencing and Analysis of the Human Genome” Celera Genomics published their results in  Science , Vol 291(5507): 1304-1351, 2001. “ The Sequence of the Human Genome”
HGP SEQUENCING STRATEGIES LARGE SCALE SEQUENCING TECHNOLOGY
Genome Glossary
Genome Glossary
Genome Glossary
HGP SEQUENCING STRATEGIES The HGP project had three stages: Genetic (or linkage) mapping Physical mapping DNA sequencing
Three-Stage Approach to Genome Sequencing
Strategic Issues There are two approaches for sequencing large repeat-rich genomes. First is a  whole-genome shotgun  sequencing approach, as has been  used for the repeat-poor genomes  of viruses, bacteria and flies, using linking information and computational Second  is the  ‘hierarchical shotgun sequencing’  approach , also referred to as ` map-based', `BAC-based' or `clone-by-clone'
‘ HIERARCHICAL SHOTGUN SEQUENCING’ `MAP-BASED', `BAC-BASED' OR  `CLONE-BY-CLONE' Technology for large-scale sequencing US HGP
Hierarchical shotgun sequencing
Clone-by-clone or hierarchical sequencing strategy Advantages: Ability to fill gap  and re-sequence the uncertain regions. Ability to  distribute the clones to other labs Ability to check the produced sequence by restriction enzymes Disadvantages: Expensive and time-consuming  for construction of the physical map Experienced personnel are required,
HIERARCHIAL  ASSEMBLY OF SEQUENCE CONTIG SCAFFOLD
Assembly of the draft genome sequence The key steps  in assembling individual sequenced clones into the draft genome sequence.
Levels of clone and sequence coverage.
WHOLE-GENOME SHOTGUN Developed by J. Craig Venter
Whole-Genome Shotgun Approach to Genome Sequencing The whole-genome shotgun approach was developed by  J. Craig Venter in 1992 . This approach  skips genetic and physical mapping  and sequences random DNA fragments directly. Powerful computer programs are used to order fragments into a continuous sequence.
Whole-Genome Shotgun Sequencing
Shotgun Sequencing Strategy Advantage:  No physical map construction,  Less risk of recombinant clones,  Cost effective and fast.  Ideal for  small genome sequencing Disadvantage:  Difficult to fill gaps  and  Re-track all the sequenced plasmids,  Data less useful for positional cloning
Whole-Genome Assembly
Hierarchical vs. Shotgun Sequencing
Assembly of a mapped scaffold
Generating the draft genome sequence Generating a draft sequence of the human genome involved three steps:  Selecting the BAC clones to be sequenced , Sequencing  them ,and Assembling  the individual sequenced clones into an overall draft genome sequence.
Assembly of the draft genome sequence This process involved three steps: Filtering, Layout  and  Merging . The entire data set was filtered uniformly to  eliminate contamination from nonhuman sequences and other artefacts that had not already been removed by the individual centres.
Assembly of the draft genome sequence The sequenced clones were then associated with specific clones on the  physical map to produce a `layout'.  The fingerprint clone contigs were then  mapped to chromosomal locations , using sequence matches to  mapped STSs  from four human maps;  radiation hybrid maps , one  YAC  and two  genetic maps  together with data from  FISH
The human genome assembly and annotation process BUILD CYCLE DATA FREEZE RELEASE
The human genome assembly and annotation process :  INPUTS
 
 
Genome Annotation Feature Annotation  Clone Features STS Features SNP Features Gene, mRNA(transcript),  misc_RNA(pseudogenes , and non-coding transcripts, )  Protein Features Repeat features
Genome Annotation Products  Sequence Data Resource Support( dbSNP , Entrez Gene, Map Viewer, UniSTS) Data Access  BLAST Entrez Retrieval(Accession number, gene symbol, or protein name) FTP(genomes FTP site)
Links from Map Viewer objects to other NCBI resources
UCSC put the human genome sequence on the web July 7, 2000 UCSC put the human genome sequence on CD in October 2000, with varying results
HGP ON WEB Genome Browsers  were developed and are maintained by the University of California at Santa Cruz  (UCSC)  . EnsEMBL  project of the European Bioinformatics Institute and the Sanger Centre Additional browsers have been created;  URLs are listed at  www.nhgri.nih.gov/genome_hub.  These  web-based computer tools  allow users to view an  annotated display  of the draft genome sequence, with the ability to scroll along the chromosomes and zoom in or out to different scales.  In addition to using the  Genome Browsers, one can download from these sites the entire draft genome sequence together  with the annotations in a computer-readable format.
UCSC GENOME BROWSER
 
 
Broad genomic landscape The distribution of  GC content,  CpG  islands Recombination rates,  Repeat content  and Gene content  of the human genome.
Long-range variation in GC content GC-rich and GC-poor regions may have different biological properties: Gene density,  Composition of repeat sequences, correspondence with cytogenetic bands Recombination rate CpG  islands are of particular Interest because they are associated with the  5’ends of genes
Repeat content of the human genome
INTERSPERSED REPEATS
Gene content of the human genome RNA genes and protein-coding genes in the human genome. Noncoding RNAs
There are several major classes of ncRNA tRNA  rRNAs  small nucleolar RNAs (snoRNAs) are small nuclear RNAs (snRNAs) are critical components of spliceosomes, the large ribonucleoprotein (RNP) complexes that splice introns out of pre-mRNAs in the nucleus. ncRNAs do not have translated ORFs, are often small and are not polyadenylated.
   Software tools for  ab initio  gene prediction
   Software tools for  ab initio  gene prediction
Distribution of the homologues of the predicted human proteins.
Conserved segments in the human and mouse genome.   *  Each colour corresponds to a particular mouse chromosome.
DISEASE GENES
DRUG TARGETS
Research challenges in genetics-- what we still don't know, even with the full human DNA sequence in hand.  Gene number, exact locations, and functions ,Gene regulation  DNA sequence organization ,Chromosomal structure and organization  Noncoding DNA types, amount, distribution, information content, and functions  Coordination of gene expression, protein synthesis, and post-translational events  Interaction of proteins in complex molecular machines  Predicted vs. experimentally determined gene function  Evolutionary conservation among organisms ,Protein conservation (structure and function)  Proteomes in organisms  Correlation of SNPs with health and disease  Disease-susceptibility prediction based on gene sequence variation  Genes involved in complex traits and multigene diseases  Complex systems biology, including microbial consortia useful for environmental restoration  Developmental genetics, genomics
“ The more we learn about the human genome, the more there is to explore” “ We shall not cease from exploration. And the end of all our exploring will be to arrive where we started, and know the place for the first time.”  T. S. Eliot

Human genome project

  • 1.
    HUMAN GENOME PROJECTMS.RUCHI YADAV LECTURER AMITY INSTITUTE OF BIOTECHNOLOGY AMITY UNIVERSITY LUCKNOW(UP)
  • 2.
    HUMAN GENOME PROJECTGENOME SEQUENCING GENOME ASSEMBLY GENOME ANNOTATION
  • 3.
    Human Genome ProjectBackground The idea of sequencing the entire human genome was First proposed in discussions at scientific meetings organized by the US Department of Energy and others from 1984 to 1986 Recommended a broader programme, to include: The creation of genetic, physical and sequence maps of the human genome; Parallel efforts in key model organisms such as bacteria, yeast, worms, fies and mice; Development of technology in support of these objectives; Research into the ethical, legal and social issues raised by human genome research.
  • 4.
    HGP BACKGROUND…… HumanGenome Organization (HUGO) & International Human Genome Sequencing Consortium (IHGSC) was founded to provide a forum for international coordination of genomic research HGP Project is constituted as the National Human Genome Research Initiative (NHGRI).   The collaboration was coordinated through periodic international meetings (referred to as ‘Bermuda meetings’ ) Work was shared flexibly among the centres , with some groups focusing on particular chromosomes and others contributing in a genome-wide fashion. The second principle was rapid and unrestricted data release. The centres adopted a policy that all genomic sequence data should be made publicly available without restriction within 24 hours of assembly (Bermuda Principle)
  • 5.
    Human Genome ProjectBegun formally in 1990 , the U.S. Human Genome Project was a 13-year effort coordinated by the U.S. Department of Energy and the National Institutes of Health. The project originally was planned to last 15 years, but rapid technological advances accelerated the completion date to 2003 . Project goals were to :- Identify all the approximately 20,000-25,000 genes in human DNA, Determine the sequences of the 3 billion chemical base pairs that make up human DNA, Store this information in databases, Improve tools for data analysis, Transfer related technologies to the private sector, and Address the ethical, legal, and social issues (ELSI) that may arise from the project.
  • 6.
  • 7.
  • 8.
    Milestones : June2000 : Completion of a working draft of the entire human genome February 2001 : Analyses of the working draft are published April 2003 : HGP sequencing is completed and Project is declared finished two years ahead of schedule
  • 9.
    Timeline oflarge-scale genomic analyses.
  • 10.
    HUMAN GENOME Thehuman genome contains 3 billion chemical nucleotide bases (A, C, T, and G).  The average gene consists of 3000 bases , but sizes vary greatly, with the largest known human gene being dystrophin at 2.4 million bases. The total number of genes is estimated at around 30,000 much lower than previous estimates of 80,000 to 140,000.   Almost all (99.9%) nucleotide bases are exactly the same in all people.   The functions are unknown for over 50% of discovered genes.
  • 11.
    HUMAN GENOME PROJECTPUBLIC AND PRIVATE SECTOR
  • 12.
    Two Different GroupsWorked to Obtain the DNA Sequence of the Human Genome The US HGP is a multinational consortium established by government research agencies and funded publicly. Celera Genomics is a private company whose former CEO, J. Craig Venter and Francis collins , ran an independent sequencing project. Differences arose regarding who should receive the credit for this scientific milestone . June 6, 2000, the HGP and Celera Genomics held a joint press conference to announce that TOGETHER they had completed ~97% of the human genome.
  • 13.
    PUBLISHED The InternationalHuman Genome Sequencing Consortium published their results in Nature, 409 (6822): 860-921, 2001. “ Initial Sequencing and Analysis of the Human Genome” Celera Genomics published their results in Science , Vol 291(5507): 1304-1351, 2001. “ The Sequence of the Human Genome”
  • 14.
    HGP SEQUENCING STRATEGIESLARGE SCALE SEQUENCING TECHNOLOGY
  • 15.
  • 16.
  • 17.
  • 18.
    HGP SEQUENCING STRATEGIESThe HGP project had three stages: Genetic (or linkage) mapping Physical mapping DNA sequencing
  • 19.
    Three-Stage Approach toGenome Sequencing
  • 20.
    Strategic Issues Thereare two approaches for sequencing large repeat-rich genomes. First is a whole-genome shotgun sequencing approach, as has been used for the repeat-poor genomes of viruses, bacteria and flies, using linking information and computational Second is the ‘hierarchical shotgun sequencing’ approach , also referred to as ` map-based', `BAC-based' or `clone-by-clone'
  • 21.
    ‘ HIERARCHICAL SHOTGUNSEQUENCING’ `MAP-BASED', `BAC-BASED' OR `CLONE-BY-CLONE' Technology for large-scale sequencing US HGP
  • 22.
  • 23.
    Clone-by-clone or hierarchicalsequencing strategy Advantages: Ability to fill gap and re-sequence the uncertain regions. Ability to distribute the clones to other labs Ability to check the produced sequence by restriction enzymes Disadvantages: Expensive and time-consuming for construction of the physical map Experienced personnel are required,
  • 24.
    HIERARCHIAL ASSEMBLYOF SEQUENCE CONTIG SCAFFOLD
  • 25.
    Assembly of thedraft genome sequence The key steps in assembling individual sequenced clones into the draft genome sequence.
  • 26.
    Levels of cloneand sequence coverage.
  • 27.
  • 28.
    Whole-Genome Shotgun Approachto Genome Sequencing The whole-genome shotgun approach was developed by J. Craig Venter in 1992 . This approach skips genetic and physical mapping and sequences random DNA fragments directly. Powerful computer programs are used to order fragments into a continuous sequence.
  • 29.
  • 30.
    Shotgun Sequencing StrategyAdvantage: No physical map construction, Less risk of recombinant clones, Cost effective and fast. Ideal for small genome sequencing Disadvantage: Difficult to fill gaps and Re-track all the sequenced plasmids, Data less useful for positional cloning
  • 31.
  • 32.
  • 33.
    Assembly of amapped scaffold
  • 34.
    Generating the draftgenome sequence Generating a draft sequence of the human genome involved three steps: Selecting the BAC clones to be sequenced , Sequencing them ,and Assembling the individual sequenced clones into an overall draft genome sequence.
  • 35.
    Assembly of thedraft genome sequence This process involved three steps: Filtering, Layout and Merging . The entire data set was filtered uniformly to eliminate contamination from nonhuman sequences and other artefacts that had not already been removed by the individual centres.
  • 36.
    Assembly of thedraft genome sequence The sequenced clones were then associated with specific clones on the physical map to produce a `layout'. The fingerprint clone contigs were then mapped to chromosomal locations , using sequence matches to mapped STSs from four human maps; radiation hybrid maps , one YAC and two genetic maps together with data from FISH
  • 37.
    The human genomeassembly and annotation process BUILD CYCLE DATA FREEZE RELEASE
  • 38.
    The human genomeassembly and annotation process : INPUTS
  • 39.
  • 40.
  • 41.
    Genome Annotation FeatureAnnotation Clone Features STS Features SNP Features Gene, mRNA(transcript), misc_RNA(pseudogenes , and non-coding transcripts, ) Protein Features Repeat features
  • 42.
    Genome Annotation Products Sequence Data Resource Support( dbSNP , Entrez Gene, Map Viewer, UniSTS) Data Access BLAST Entrez Retrieval(Accession number, gene symbol, or protein name) FTP(genomes FTP site)
  • 43.
    Links from MapViewer objects to other NCBI resources
  • 44.
    UCSC put thehuman genome sequence on the web July 7, 2000 UCSC put the human genome sequence on CD in October 2000, with varying results
  • 45.
    HGP ON WEBGenome Browsers were developed and are maintained by the University of California at Santa Cruz (UCSC) . EnsEMBL project of the European Bioinformatics Institute and the Sanger Centre Additional browsers have been created; URLs are listed at www.nhgri.nih.gov/genome_hub. These web-based computer tools allow users to view an annotated display of the draft genome sequence, with the ability to scroll along the chromosomes and zoom in or out to different scales. In addition to using the Genome Browsers, one can download from these sites the entire draft genome sequence together with the annotations in a computer-readable format.
  • 46.
  • 47.
  • 48.
  • 49.
    Broad genomic landscapeThe distribution of GC content, CpG islands Recombination rates, Repeat content and Gene content of the human genome.
  • 50.
    Long-range variation inGC content GC-rich and GC-poor regions may have different biological properties: Gene density, Composition of repeat sequences, correspondence with cytogenetic bands Recombination rate CpG islands are of particular Interest because they are associated with the 5’ends of genes
  • 51.
    Repeat content ofthe human genome
  • 52.
  • 53.
    Gene content ofthe human genome RNA genes and protein-coding genes in the human genome. Noncoding RNAs
  • 54.
    There are severalmajor classes of ncRNA tRNA rRNAs small nucleolar RNAs (snoRNAs) are small nuclear RNAs (snRNAs) are critical components of spliceosomes, the large ribonucleoprotein (RNP) complexes that splice introns out of pre-mRNAs in the nucleus. ncRNAs do not have translated ORFs, are often small and are not polyadenylated.
  • 55.
       Software toolsfor ab initio gene prediction
  • 56.
       Software toolsfor ab initio gene prediction
  • 57.
    Distribution of thehomologues of the predicted human proteins.
  • 58.
    Conserved segments inthe human and mouse genome. * Each colour corresponds to a particular mouse chromosome.
  • 59.
  • 60.
  • 61.
    Research challenges ingenetics-- what we still don't know, even with the full human DNA sequence in hand. Gene number, exact locations, and functions ,Gene regulation DNA sequence organization ,Chromosomal structure and organization Noncoding DNA types, amount, distribution, information content, and functions Coordination of gene expression, protein synthesis, and post-translational events Interaction of proteins in complex molecular machines Predicted vs. experimentally determined gene function Evolutionary conservation among organisms ,Protein conservation (structure and function) Proteomes in organisms Correlation of SNPs with health and disease Disease-susceptibility prediction based on gene sequence variation Genes involved in complex traits and multigene diseases Complex systems biology, including microbial consortia useful for environmental restoration Developmental genetics, genomics
  • 62.
    “ The morewe learn about the human genome, the more there is to explore” “ We shall not cease from exploration. And the end of all our exploring will be to arrive where we started, and know the place for the first time.” T. S. Eliot