Some facts about human genome
1. "Chromosome walking"
2. "Genome shotgun sequencing"
![]() - hierarchical method allows targeting of additional sequencing to under-represented regions |
![]() Potential risks:
- outbred organism – at least 2 copies -> large-scale
structural heterozygosity, Single Nucleotide Polymorphisms
|
Sequencing
- fully automated from library transformation to reading
- DNA from 5 subjects was selected for sequencing
Random Shotgun Data Set
8 Sep 1999 – 17 Jun 2000 -> 27 mil. reads of average
length 540 bp (175 000 reads per day) -> ~ 5x coverage
Mate pairs (a key feature of the sequencing)
from 2-, 10-, 50-kbp inserts (3.42x, 16.40x, 18.84x coverage)
Two different approaches to assembly:
1. Whole-genome assembly (WGA)
Two sets of data:
-random shotgun trimmed sequences produced at Celera
(5x coverage)
-publicly funded HGP data derived from BAC clones (downloaded
from GenBank on 2 Sep 2000, shredded to reads; locations of BACs were not
used in this process) (2.96x coverage)
2. Compartmentalized assembly (CSA)
- first, to partition data into sets localized
to large chromosomal segments (using HGP information) and then shotgun
assembly on each set (~hierarchical approach)
Whole-genome assembly
SCREENER – screened out all repeat elements but microsatellites
OVERLAPPER – compared every read against every other read in search of overlaps of at least 40 bp with <6% differences (4-5 days for 40 computers with 4-GB RAM operating in parallel); algorithm used by Celera was able to identified reads from repetitive elements and find the boundary of the start of such elements
SCAFFOLDER – proceeded to use mate-pair information to link these together into scaffolds: 2- and 10-kbp mate pairs -> intermediate-sized scaffolds that are then linked together by confirming 50-kbp mate pairs
REPEAT RESOLVER – filling the gaps with certain level of mistake
Set of scaffolds -> 2.85 Gbp in span, 2.6 Gbp of sequence
Scaffolds >100 kbp long cover 84% of the genome
Scaffolds >10 Mbp long cover 25% of the genome
The average scaffold size was 1.5 Mbp
The average contig size was 24 kbp
The average gap size was 2.4 kbp
HIERARCHICAL SHOTGUN METHOD
Genomic DNA from anonymous human donors was partially
digested with restriction enzymes ->
Clones from 8 large-insert libraries containing BAC or
PAC (bacterial or P1-derived arteficial chromosome) –
together 65-fold coverage
HindIII
BACs --------> agarose gels -------->
fingerprints
Fingerprint clone contigs – anchoring to chromosomes by STS markers from existing genetic and physical maps and also by FISH.
SELECTION CLONES FOR SEQUENCING
that make up the draft genome sequence with minimal overlaps
(in addition, the overlaps between BAC clones provide a rich collection of SNPs (Single Nucleotide Polymorphisms))
Sequencing project shared among 20 centres from 6 countries
---> necessary to coordinate the selection of clones --->
most centres focused on particular chromosome
SHOTGUN SEQUENCING OF SELECTED CLONES
the details of protocol and automation varied among the
centres – the most aggressive automation ---> 100 000 reactions in 12
hours
Data integration by a common computational procedure; all assembled contigs >2kb deposited in public databases within 24 hours
Sequencing output rose sharply during production:
By June 2000 – sequence equivalent to 1-fold coverage
of the entire human genome in less than 6 weeks !
GIG ASSEMBLER
Version of the draft sequence on 7 Oct 2000:
29 298 overlapping BACs ~ 4.26 Gbp
---> 23 Gbp sequences ~ 7.5-fold coverage
---> 90% of euchromatic part of the
genome
additionally:
3 centres – WHOLE GENOME SHOTGUN
~ 0.75-fold coverage
~ statistically includes
50% of the nucleotides in the human genome
By comparing this raw data with draft ---> SNPs
When is the human genome “finished”?
- Fewer than 1 base in 10 000 is incorrectly assigned
- More than 95% of euchromatic region is sequenced
- Each gap is smaller than 150 kb
(Such standards represent realistic goals given current
technology)
7 Oct 2000: 25% of the human genome in finished stage
(this include finished chromosomes 21 and 22)
FILLING THE GAPS