Chapter 1: Intro
- Genome à made by DNA (exceptions: viruses have RNA genomes)
- Genome expression = info contained in genome released to cell
- First product = transcriptome (RNAs from those genes that are active)
- Second product = proteome (proteins that specify biochemical reactions)
1.1 RNA
Coding RNA:
1. mRNAs = trasncripts of protein-coding genes
Non-coding RNA:
1. rRNAs
2. tRNAs
3. sncRNAs (eukaryotes)
4. lncRNAs (eukaryotes)
Chapter 2: Studying DNA
2.1. Enzymes
1. DNA polymerase
- synthesize new polynucleotides complementary to DNA template
- Template-dependent DNA polymerase: e.g. Kornberg polymerase = polymerase 1 (both 3’
to 5’ and 5’ to 3’ exonuclease activity & DNA labeling), Taq polymerase à thermostable à
use in PCR
- need primers (e.g. oligonucleotides for in vivo exp) à initiate synthesis
DNA-dependent polymerase RNA- dependent polymerase
Add nucleotides to 3’ end Make DNA copies from RNA template
e.g. Reverse transcriptase à make cDNAs
in vivo, involve in immunodeficiency
viruses
2. Nucleases
- simple double strand cut à produce blunt end
- cut at diff. positions à produce sticky/ cohesive end
- diff enzymes can give same sticky ends sometimes
- Exonucleases: cut at specific site
- Endonucleases: nucleotides remove from ends
- can be used when measuring molecular sizes of DNA à gel electrophoresis (smallest is
fastest) à DNA sequencing
3. Ligases
- link phosphodiester bond between unlinked nucleotides
- ATP/NAD required
- e.g. T4 bacteriophage
- can link 2 DNA molecules/ Self ligation
- sticky end à increase opportunity for ligase to attach
4. End-modification Enzymes
- make changes to the end
- e.g. Terminal deoxynucleotidyl transferase à add nucleotides to end of dsDNA
2.2. PCR
- Denaturation (94 degrees) à Priming (55) à Annealing (72)
- ½ of 2nd cycle product is identical to 1st cycle
- “short” product accumulate in 3rd cycle
- e.g. Real-time PCR à follow rate of product formation à use reporter probe à give
fluorescent signal when hybridizes
Quenching by Foster Resonance Energy Transfer (FRET)
Adv: quantify amount of starting DNA
Disadv: can’t distinguish delta from omicron strain of COVID19
2.3. DNA Cloning
- for gene expression/ gene mapping/ genotyping
- e.g pUC8 plasmid:
1. Ampicillin-resistance gene
2. LacZ gene (if ligation successful à insertional inactivation à loss beta-galactosidase
activity à check by histochemical test (X-gal) à white colony)
3. Ori (origin of replication)
- Vectors:
1. Bacterial artificial chromosomes (BACs) à higher capacity for insertion
- can transfer over 300Kb, stable insert, used in Human Genome Project
2. Fosmids (Bacteriophage lamda) à lower copy number in E.coli à less prone to instability
problem
Chapter 3: Genetic Mapping
3.1 Importance of genetic mapping
- due to DNA sequencing’s limitations:
1. X obtain sufficient short sequences to produce contiguous DNA sequence/ gaps
2. Shotgun method (search for overlaps & build up master sequence) has error if genome
contains repetitive DNA sequences à repeats being left out as master sequence/ link tgt if
diff. chromosome
- genetic mapping can sequence more complex genomes
3.2 Markers for Genetic Mapping
1. Genes (1st markers to be used)
- must be exist in >= 2 forms/ alleles à specify diff & usually visible phenotype
2. DNA markers (RFLPs and SSLPs)
a. RFLPs: cut polymorphic restriction site marker as asterisk
- can be typed by Southern hybridization à digest DNA with restriction enzyme à transfer
to nylon membrane à autoradiography à to check present of site by numbers of bands
- can be types by PCR à use PCR primers à use restriction enzymes à gel electrophoresis
à absent (1 band)/ present (2 bands)
b. SSLPs (repeat sequences displaying length variations)
Minisatellites (VNTRs) Microsatellites (STRs)
Not spread evenly around genome More Conveniently spaced throughout the
genome
Longer in length à X perform PCR Shorter in length à perform PCR à
quicker and more accurate
- Typed by PCR: amplify sequence à gel electrophoresis/ capillary electrophoresis
(displayed as electrophoretogram) à can calculate precise length of PCR product
c. SNP (Single nucleotide polymorphisms) = a position in genome where individuals have 1
nucleotide, and others have diff nucleotide
- most useful DNA marker
- high density of SNP in a genome à widely adopted in genome mapping
- high resolution
- Typing by Oligonucleotide hybridization method: oligonucleotide form complete base-
paired structure with 2nd molecule à successful hybridization à discriminate between 2
alleles of SNP (modified to DNA chip/ Solution hybridization)
à high stringency à incubation temp must be below melting temp. of oligonucleotide
(unstable base-paired hybrid if high Tm)
à Tm = (4 x no. of G & C) + (2 x no. of A & T)
- DNA chip à labelled DNA à hybridization determined by laser scanning/
fluorescence confocal microscopy
- Solution Hybridization à dye quenching technique = molecular beacons à similar to
2.2
3.3 Basis to Genetic Mapping
- inheritance & linkage: homozygosity (self-fertilization of pure breed plants)/ heterozygosity
(cross fertilization of pure-breed plants)
- Problems of Mendel’s law:
1. incomplete dominance à display a phenotype intermediate between 2 homozygous forms
2. Co-dominance à heterozygous display both phenotypes
- Mendel’s law:
1. 1st law: Monohybrid cross à 3:1 phenotypes
2. 2nd law: Dihybrid cross à 9:3:3:1 phenotypes
- Partial linkage = unusual ratio in F1 cross
- Mitosis à no crossover
- Meiosis à crossover in prophase à more genetic variation (more diversity)
- Crossover à all 4 possible genotypes à can lead to partial linkage
- Frequency with genes that are unlinked by crossovers [directly proportional] to how far
they are apart on the chromosome (close: less crossover frequency) à perform
recombination frequencies for diff pairs of genes à construct map with relative
position on the chromosome (genetic map)
- One map unit = 1% frequency = 1 cM
- Problems of genetic map in guidance of genome sequencing:
1. Recombination hotspots are more likely to involve in crossovers à genetic map
distance X indicate physical distance between 2 markers
2. Single chromatid can participate in >1 crossovers at the same time à more
inaccuracies.
3.4 Linkage analysis
- carried out planned experiments with fruit flies/ mice (human can use family pedigrees)
- Test cross (breeding experiments): double heterozygous (ab/ab) X Pure-breeding
homozygous (AB/ab) à calculate fraction that are recombinants
- Pedigree analysis = examine genotypes of successive generations of existing families
à mapping resolution is limited à due to difficult interpretation/ individuals are dead or
unwilling to cooperate
à determine degree of linkage between disease gene and microsatellite M à reappear of
maternal grandmother à prove low combination frequency indicates disease gene is tighly
linked to microsatellite M
-Limitations:
1. Not possible to obtain large numbers of progeny à few meiosis can be studied à
restricted resolving power
2. Limited accuracy à due to presence of recombination hotspots (more likely to occur
at some points than others)
-Alternatives à Physical mapping
- Diff in positioning with genetic map
- Use sequence-tagged site (STS marker) à from ESTs/ SSLPs/ random genomic
sequences à collect overlapping DNA fragments that span entire chromosome
- Preparing cDNA: PolyA tail + oligodT primers à reverse transcriptase à ribonuclease
H degrade RNA à DNA polymerase 1 synthesis 2nd strand à cDNA
3.5 Physical mapping by assigning markers to DNA fragments
- mapping reagent à genomic DNA library (clone library)
Chapter 4: Genome Sequencing
4.1 1st generation sequencing
- Sanger sequencing (Chain termination sequencing): use ddNTP to terminate elongation à
diff ddNTP with diff fluorophoreà detectionà imaging system
- Requirement of DNA polymerases:
1. High processivity à X dissociate from template before adding ddNTP
2. Negligible/ No 5’ to 3’ exonuclease activity à cant determine correct sequence
3. Negligible/ No 3’ to 5’ exonuclease activity à don’t remove ddNTP à Sequence close to
primer will not be unreadable
Strengths Limitations
High accuracy (>99.9%) Low capacity/ yield/ throughput
Relatively long read (1kb) At least 5x sequence depth is required
Read depth/ coverage is needed to identify
sequencing error
4.2 Next-generation sequence (NGS)
- determine order of nucleotides in entire genome
- Preparing sequence library: glass slide method à PCR products attach to adjacent
oligonucleotidesà identical immobilized fragments
- E.g. Illumina (reverse terminator sequencing) à remove 3’ fluorescent blocking group after
each nucleotide addition
Strengths Limitations
Ultra-high throughput (enable thousands/ Short read length (<0.3kb)
millions of DNA fragments to be sequenced in
parallel in single exp)/ scalability/ speed
High accuracy (>99.5%) Not portable
High yield
4.3 3rd Generation seq
- E.g. PacBio (Single molecule real time sequencing) à each nucleotide addition is detected
with a zero-mode waveguide
Strengths Limitations
High Throughput Low accuracy (90%)
Long read length (50kb) Not portable
Complicated library preparation steps
4.4 3rd/4th Generation seq
- E.g. Nanopore sequencing à read sequence of DNA directly without copying the molecule
- DNA pass through nanopore à perturbed flow of ions à current across membrane
- can sequence RNAs directly
Strengths Limitations
High throughput Low accuracy (92%)
Long read length (1Mb)
Simple library preparation steps
Portable
4.5 How to sequence a genome
1. Shotgun method (eukaryote/ prokaryote): build up master sequence from short sequence à
examine overlaps
à Problems: Sequence gaps between end-sequences contigs à probe second clone library
with oligonucleotides/ PCR with pairs of oligonucleotides
2, de novo sequencing (illumina à assembly)
à not random, sequence entire genome
3. Alternatives: Reference genome approach (when studying closely related organism à place
reads on reference genome)
à Problems
1. If position of genes are diff in reference genome and genome being sequenced à cant
recognize
2. Shotgun sequencing of eukaryotic genome require sophisticated assembly programs
A. Size of read à eukaryotic genome is longer
B. Errors when genome contains repetitive DNA sequences
-De Bruijn graph à sequence assembly by Eulerian pathway à make master sequence
- Hierarchical shotgun sequencing (complex genome) à avoid problem with repeat sequences
à basis of chromosome walking (insert first clone à can find second clone)
à can sequence more complex genome