Lecture 2 - Sequencing
Lecture 2 - Sequencing
CMMB 461
University of Calgary
Gordon Chua
http://www.sciencedaily.com/ 1
Suggested readings
•Green (2001) Nature Reviews 2: 573-582 (genome sequencing review)
http://www.ncbi.nlm.nih.gov/pubmed/11483982
Kremkow and Lee (2015) Biotechnol. Lett. 37: 55-65 (next generation
sequencing review)
http://www.ncbi.nlm.nih.gov/pubmed/25214225
2
Sequencing a genome:
“Obtaining the parts list of the cell/organism”
ACGT
3
DNA synthesis/replication
Requires DNA polymerase, primer, template DNA and dNTPs
3’-ATGTTCCGCGATAAGCTTTAA-5’
5’-TACAA-3’-OH
dGTP
Primer
Pyrophosphate (PP)
3’-ATGTTCCGCGATAAGCTTTAA-5’
5’-TACAAG-3’-OH
dNTPs
Pyrophosphate (2P)
3’-ATGTTCCGCGATAAGCTTTAA-5’
5’-TACAAGGCGCTATTCGAAATT-3’
4
DNA sequencing by the chain termination method (Sanger
Sequencing)
Different size fragments are generated during DNA synthesis
depending on the location of ddNTP incorporation/termination
Deoxynucleotide Dideoxynucleotide
5’-CH2 Base 5’-CH2 Base
P-P-P P-P-P
3’-GGCCAATGAACCGTCACAGTTA-5’ -
5’-CCGGTTACTTGGC-3’-H
Applied Biosystems
10
gene/DNA sequence
Cloning vectors sequencing require many
copies of the same DNA
YFG fragment
Common Features:
•Promoter: constitutive/inducible
•Multicloning site: unique restriction sites for
inserting gene
13
Alignment of BAC clones
1. BAC library screening by hybridization - add alkaline base - which denature DNA
(have to do this first so the probe can bind)
•Rapid identification of overlapping clones using a random
sequence/probe (single stranded DNA)
1
2
sequencing by hybridization
http://www.roswellpark.org
black spot is the colony- need to have the same location as the probe
14
Alignment of BAC clones
2. Restriction fingerprinting BAC clones
•Complete restriction digest of BAC clones followed by gel electrophoresis
to determine restriction fragment profile for each BAC clone
•Identify BAC clones with common restriction fragments: indicative that the
clones are overlapping (usually done computationally)
1
BAC clones
2
3
4
18
Genome coverage
How many times should the genome be sequenced
(coverage) to ensure a high degree of accuracy?
- Human genome 3.3 billion BPs long, we cannot sequece it once because there are overlapping sequencing - it's not possible to
sequece the entore genome once
Assumption: sequencing reads will be randomly distributed in
the genome (i.e. ability to sequence a particular region of the
genome does not differ)
C (coverage)=LN/G, where,
L= sequence read length in bp how much sequence per sequence reaction, 5-800 bp
N=number of reads sequenced Number of clones
G=haploid genome length in bp
P(0)=e-ƛ where:
ƛ=genome coverage
Primers for
sequencing reads Sequencing reads
(400-700 bp)
Insert
Contigs
Clones
Scaffold/supercontig
Read B Read D
Good Bad (misassembly?)
CT TAATCTTGTTATTTCCGG GAT
24
Chromosomes and genes
5’ 3’
3’ 5’
5’ ATG...TAA TTA...CAT 3’
25
Mining genomic data (bioinformatics)
Gene-finding approaches
2nd generation
SGS
3rd generation
TGS
1st generation
Schuster (2008) Nature Methods 5: 16-18
Sanger Science 343: 829-830 (2014) 29
Main differences between second-generation
sequencers and capillary-based sequences
•Library construction: fragment genomic DNA and
PCR, bypassing vector cloning
30
PCR: gene amplification for analysis
Number of molecules synthesized per cycle (x) is logarithmic [2x]
Forward Reverse
primer primer
Taq polymerase
synthesizes both DNA
strands simultaneously
users.ugent.be 31
1. Roche 454 Sequencer (SGA-2004)
https://www.roche-applied-science.com/sis/sequencing/flx/index.jsp
•Fragment DNA and
ligate adaptors to
ends
Step 1:
•Select fragments with
two different adaptors
(for PCR)
•Nick nonbiotinylated
strand to get sstDNA
library
•Add sequencing
Step 3:
reaction components
including adenosine
5’phosphosulfate (APS),
luciferin, luciferase
•Take an image of
picotiter plate and repeat
with next dNTP
http://www.youtube.com/watch?v=bFNjxKHP8Jc 33
2. Illumina Solexa sequencing (SGS)
Flow cell http://www.illumina.com/
1. Sample preparation and amplification
•Fragment DNA and add linkers
(adaptors) at the ends.
•Sequencing reaction
involves flooding of dNTPS
one at a time on the chip
•Nucleotide incorporation
causes a H+ to be released,
thereby changing pH of
well solution (no
fluorescence)
38
5. Pacific Biosciences Single Molecule Real Time
(SMRT) Sequencing (TGS)
•Sequencing reaction carried out in extremely small wells (50 nm) called
zero-mode waveguides (ZMV) allowing for high sensitivity to measure
fluorescence
https://www.youtube.com/watch?v=v8p4ph2MAvI 39
6. Oxford Nanopore Technologies (TGS)
•Nanopore is the bacterial α-hemolysin
protein embedded in a synthetic membrane
on an array chip
Genome sequenced (publication year) HGP (2003) Venter (2007) Watson (2008)
Time taken (start to finish) 13 years 4 years 4.5 months
Number of scientists listed as authors > 2,800 31 27
Cost of sequencing (start to fi nish) $2.7 billion $100 million < $1.5 million
Coverage 8–10 × 7.5 × 7.4 ×
Number of institutes involved 16 5 2
Number of countries involved 6 3 1
Technology Sanger Sanger Roche
42
Limitations of next-generation sequencing
43
Routine Human Genome
Sequencing/Personalized Medicine
44
Human genome projects
•Personal Genome Project (Harvard Medical School and other
countries including Canada): can volunteer to “share your genome
information for the greater good”
•38 million common SNPs, 1.4 million short insertions and deletions, and
more than 14,000 larger deletions among humans.
•As more disease genes are discovered, we will gain a better molecular
understanding of the disorder and develop better diagnostics and
therapeutics.
47