Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views33 pages

Teach Me

This document serves as an introductory course on genetics and molecular biology, detailing the structure and study of genomes, including techniques for analyzing DNA and RNA. It covers various methods such as PCR, RNA-seq, and sequencing technologies, as well as the organization of the human genome, highlighting the differences between mitochondrial and nuclear genomes. Additionally, it emphasizes the role of bioinformatics in managing and analyzing biological data.

Uploaded by

teesriv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Teach Me

This document serves as an introductory course on genetics and molecular biology, detailing the structure and study of genomes, including techniques for analyzing DNA and RNA. It covers various methods such as PCR, RNA-seq, and sequencing technologies, as well as the organization of the human genome, highlighting the differences between mitochondrial and nuclear genomes. Additionally, it emphasizes the role of bioinformatics in managing and analyzing biological data.

Uploaded by

teesriv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Welcome! It's great to dive into the fascinating world of genetics and molecular biology.

This is
a big and exciting field, so we'll break it down into manageable pieces, explaining key concepts
as we go. Think of this as a foundational course to get you started.

Let's begin!

1. Understanding and Working with Genomes

The genome is the complete set of genetic instructions in an organism. It's essentially the
blueprint for life. Studying it is complex because it's so large.

A. Basic Tools for Studying Genomes

To study a genome, we often need to:

 Break it into small fragments: The entire genome is too large to analyze at once.
 Amplify (make many copies of) specific fragments: This helps us get enough material
to study. We can do this out of a cell using a technique called PCR or inside a
cell using cloning.

B. Techniques to Study RNA

RNA is a molecule that helps carry out the instructions encoded in DNA.

 RT-qPCR (Real-time quantitative PCR):


o Purpose: To measure the amount of specific RNA transcripts (like mRNA or
miRNA) in a sample. This helps us quantify gene expression, meaning how much
a gene is "turned on" or active.
o How it works: First, RNA is converted into cDNA (a DNA copy of RNA) using a
process called reverse transcription. Then, PCR is used to amplify the cDNA.
As the cDNA amplifies, a fluorescent probe or dye (like SYBR green or
TaqMan) emits light. The more light, the more starting RNA there was.
o Applications: Analyzing gene expression and validating results from other RNA
studies.
 RNA-seq (RNA sequencing):
o Purpose: To profile gene expression in an unbiased, comprehensive way across
the entire set of RNA molecules (the transcriptome).
o Key features: It can detect different versions of transcripts (called alternative
splicing), find new transcripts not previously known, and is sensitive enough for
analyzing individual cells (single-cell analysis). It also helps pool samples for
cost efficiency using barcoding.
o Analysis: Generates very large datasets, requiring specialized computer analysis
(bioinformatics).
 FISH (Fluorescence in situ hybridization):
o Purpose: To detect specific RNA or DNA molecules directly within cells or
tissues.
o How it works: Uses fluorescently labeled RNA or DNA probes that bind to the
target molecules.
o Key feature: Offers high spatial resolution, meaning you can see gene
expression at the level of individual cells or organs. You can also use different
colored probes to detect multiple molecules at once (multiplex).

C. Techniques to Study DNA

DNA is the genetic material that makes up our genes and chromosomes.

 PCR (Polymerase Chain Reaction):


o Purpose: To amplify (make many copies of) specific DNA sequences in a test
tube (in vitro).
o How it works: It's a cyclical process with three main steps repeated many times:
1. Denaturation: Heating the DNA to separate the two strands.
2. Annealing: Cooling to allow primers (short DNA sequences) to bind to
their target DNA regions.
3. Extension: A DNA polymerase enzyme (Taq DNA pol) extends the
primers, creating new DNA strands.
o Result: The amount of DNA doubles with each cycle.
 Cloning:
o Purpose: To create many identical copies of DNA fragments using host cells,
typically bacteria.
o How it works:
1. The DNA fragment is inserted into a cloning vector (a carrier DNA
molecule, usually a plasmid).
2. This recombinant plasmid is then introduced into a host cell.
3. Cells that successfully take up the plasmid are selected, often
using antibiotics and color screening(e.g., blue/white screening using
the lacZ gene).
4. These selected colonies are grown to produce large amounts of the
plasmid DNA.
o Optional: Expression cloning can be used to produce proteins from the inserted
DNA.
 dPCR (digital PCR):
o Purpose: A more accurate way to quantify DNA, even more so than qPCR.
o How it works: The sample is divided into thousands of tiny individual PCR
reactions. Each reaction ideally contains 0, 1, or a few DNA molecules.
A Poisson distribution is used to correct for any over or underestimation of DNA
concentration.
o Key features: Provides a binary readout (positive for fluorescence, negative for
no fluorescence), leading to better sensitivity. It directly counts molecules, so
it doesn't need a standard curve (a common requirement for qPCR). It is
also unaffected by inhibitors that can interfere with PCR.
 Methyl-seq (methylation sequencing):
o Purpose: To map DNA methylation patterns. Methylation is an epigenetic
mark (a chemical tag on DNA that doesn't change the DNA sequence itself) that
often silences genes.
o How it works: A key step is bisulfite conversion, which changes unmethylated
cytosine bases into uracil, while methylated cytosines remain unchanged. After
sequencing, comparing bisulfite-treated DNA with untreated DNA reveals the
methylated sites.
o Applications: Studying epigenetics, developmental biology, and cancer research.
 Chromatin Configuration Techniques:
o Chromatin is the complex of DNA and proteins (histones) that forms
chromosomes. Its 3D structure affects gene activity.
o Hi-C-seq (chromosome conformation capture sequencing):

 Purpose: Maps 3D chromatin interactions within the nucleus. These


interactions reveal how distant parts of the genome physically interact.
 Applications: Detecting TADs (topologically associating domains),
chromatin loops, and understanding how these structures influence gene
regulation and its disruption in diseases like cancer.

o ChIP-seq (chromatin immunoprecipitation sequencing):

 Purpose: To identify DNA-protein binding sites across the genome, such


as where transcription factors (TFs) or histones bind.
 How it works: Proteins are crosslinked to DNA, DNA is fragmented,
specific DNA-protein complexes are isolated using antibodies
(immunoprecipitation), and then the bound DNA is sequenced.
 Applications: Mapping TF binding, histone modifications, and changes in
chromatin binding under different conditions (e.g., active euchromatin vs.
inactive heterochromatin).

o ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing):

 Purpose: To map open chromatin regions (euchromatin), which are


areas of the genome that are accessible for gene expression.
 How it works: Uses a Tn5 transposase enzyme that cuts accessible DNA
and inserts sequencing adapters.
 Applications: Identifying regulatory elements and changes in
transcription factor binding motifs across different cell types or conditions.

D. Single-Cell Genomics (SCG)

 Aim: To characterize gene expression, epigenetic marks, and mutations in individual


cells.
 Utility: Helps create a complete catalog of human cell types and reveals the diversity
(heterogeneity) among cells.
 Technologies: Includes techniques like single-cell RNA-seq and single-cell ATAC-seq.
 Applications: Lineage tracing (tracking cell development), biomarker discovery,
therapeutic monitoring, identifying rare cell types (stem cells, cancer precursors),
studying disease progression, and constructing the Human Cell Atlas.

E. DNA Sequencing Techniques

These techniques determine the exact order of nucleotides (A, T, C, G) in a DNA molecule.

 Sanger Sequencing (First Generation):


o Principle: A chain termination method. It uses special modified nucleotides
(ddNTPs) that stop DNA synthesis when incorporated.
o Process: DNA is copied, and the ddNTPs, each labeled with a different
fluorescent color, randomly terminate copies at specific bases. The resulting DNA
fragments, varying in length by one base, are separated by size using capillary
electrophoresis, and a laser detects the fluorescent tags, creating
a chromatogram/electropherogram that shows the sequence.
o Strengths: Reads relatively long sequences (around 1000 base pairs - bp) with a
low error rate. It's robust and still used for specific tasks.
o Limitations: Low throughput, time-consuming, and expensive for large projects.
o Applications: Good for sequencing single genes, validating mutations found by
other methods (like NGS), and genotyping known variants.
 Next-Generation Sequencing (NGS, Second Generation):
o Principle: Enables parallel sequencing of millions of DNA
fragments simultaneously. It typically involves an amplification step.
o Process: DNA is fragmented, adapters are ligated to the ends, fragments are
amplified, and then sequenced using a method like sequencing by synthesis (e.g.,
Illumina, which uses reversible terminator chemistry and fluorescence). The
signals are detected, bases are "called," and the reads are aligned to a reference
genome to identify variants.
o Strengths: Produces massive amounts of data. Suitable for discovering new
genes and rare variants (like SNPs, indels, CNVs).
o Limitations: High initial equipment costs, requires strong bioinformatics support,
produces short reads, and generates a lot of data.
o Technologies: Illumina (known for short reads, high throughput) and Ion Torrent.
o Applications: Whole genome/exome/transcriptome sequencing, RNA-seq, ChIP-
seq, and identifying various types of genetic variations.
 Third Generation Sequencing (TGS):
o Principle: Single-molecule sequencing, often in real-time, without the need for
PCR amplification.
o Strengths: Produces very long reads, which are excellent for resolving complex
or repetitive regions of the genome. Minimal GC bias (a common issue in PCR-
based methods).
o Limitations: Generally has lower accuracy than NGS, though this is improving,
and requires intensive data analysis.
o Technologies:
1. PacBio (SMRT: Single Molecule Real-Time sequencing): Detects
fluorescence as DNA polymerase incorporates nucleotides in real-time.
Can directly detect base modifications like methylation.
2. Oxford Nanopore (MinION): A portable, low-cost technology that
measures changes in electric current as DNA passes through a protein
nanopore. Produces very long reads but has a higher error rate. No PCR is
needed.
o Applications: Genome assembly (putting fragmented sequences back together),
structural variation detection, full-length transcriptomics, and epigenetics.

F. SNP Genotyping

A SNP (Single Nucleotide Polymorphism) is a variation at a single position in a DNA


sequence.

 SNP chips/microarrays: Uses probes that bind to thousands of known SNPs across the
genome. High throughput, used in genetic mapping and association studies for disease
screening, expression profiling, and breed mapping in animals.
 TaqMan Assays: A fluorescent probe-based qPCR method that targets specific SNP
alleles. Allele-specific probes have different reporter dyes. Highly specific.
 RFLP (Restriction Fragment Length Polymorphism): Relies on SNPs that affect
a restriction enzyme site (a specific DNA sequence where a restriction enzyme cuts).
Involves PCR, enzymatic digestion, and gel electrophoresis to see different fragment
sizes.
 Sequencing-based genotyping: Direct sequencing methods like Sanger or NGS can
detect SNPs.
 Microsatellite genotyping: Uses STRs (Short Tandem Repeats), which are very
polymorphic (variable). Involves PCR and polyacrylamide gel electrophoresis. Used in
forensics, paternity testing, and population genetics.

G. Bioinformatics

Bioinformatics is a field that applies computational tools to organize, analyze, and understand
molecular biological data. It's often described as an "information system for molecular biology".

 Disciplines involved: Biology, computer science, medicine, physics, mathematics, and


statistics.
 Core objectives:
o To model biological systems (e.g., protein folding, cellular pathways).
o To understand sequences.
o To organize vast amounts of sequence data.
o To manage the "data deluge" (the overwhelming amount of data) generated by
high-throughput technologies like Illumina and PacBio.
o Important for studying gene expression, regulatory networks, and evolutionary
relationships.
 Evolution: It has transformed biology into an information science, where computational
methods are essential for discovering candidate genes, linking phenotypes (observable
traits) to genotypes (genetic makeup), and visualizing cellular networks.
 Key Concepts:
o Comparing Biological Sequences: To identify homologous
sequences (sequences sharing a common ancestor).
 Orthologs: Same gene in different species, evolved from a common
ancestor.
 Paralogs: Genes duplicated within the same organism.
o Sequence Alignment: Measures similarity between DNA or protein sequences by
comparing nucleotides or amino acids. It assigns scores for matches, mismatches,
and gaps (insertions/deletions or indels).
 Global alignment: Aligns entire sequences end-to-end.
 Local alignment: Finds the best-matching subsections, useful for
identifying conserved domains.
o Scoring and Speed: Uses scoring matrices (like BLOSUM, PAM for proteins)
that weigh biologically likely substitutions. Penalizes gaps. Computational
efficiency is key, so heuristics (like BLAST) are used for faster searches.
 Tools:
o BLAST (Basic Local Alignment Search Tool): A widely used tool to quickly
compare a sequence against a database. It breaks sequences into short "words,"
matches them to a database, and then extends the alignments. Used to identify
unknown genes/proteins, annotate new genomes, and find conserved regions
across species.
o Gene Annotation: Assigns biological meaning to raw sequence data, often
through similarity searches, comparative genomics, or de novo (from scratch)
predictions.
o Genome Browsers: Software tools that allow visualization and exploration of
annotated genomic sequences. Examples include NCBI Genome Data Viewer,
UCSC Genome Browser, and Ensembl.
 Features: Interactive interface, track-based format for displaying different
data layers (genes, exons, variants, TF binding sites, RNA expression,
epigenetic markers). Links to functional data (e.g., Gene Ontology),
clinical significance (e.g., ClinVar), and population variants (e.g., 1000
Genomes Project).
 Practical Use: Allows researchers to avoid "running experiments blindly"
by providing context.
 Applications: Identifying gene structures, regulatory elements, comparing
genomes across species, designing primers, and viewing experimental
data.

2. Organization of the Human Genome

The human genome is divided into two main parts: the mitochondrial genome and the nuclear
genome.
A. Mitochondrial vs. Nuclear Genomes

Feature Mitochondrial Genome Nuclear Genome


Linear DNA organized into 23 pairs of
Structure Circular DNA
chromosomes
Inheritance From mother only From both parents
Number per Many copies per mitochondrion, many Diploid (two copies of each
cell mitochondria per cell chromosome) per somatic cell
Size 16,569 bp (very small) 3.1 billion bp (very large)
Approx. 40,000 (21,000 protein-coding
Gene count 37 (13 proteins, 22 tRNAs, 2 rRNAs)
+ non-coding RNAs)
Introns None Yes (non-coding regions within genes)
Mutation rate Higher (less error checking) Lower (more robust repair systems)
Coding
93% coding Only 2% coding (protein-coding genes)
content
Diverse functions (structural,
Function ATP production via cellular respiration
enzymatic, regulatory)
Use as a tool D-loop used to track maternal lineages Primary genome for traits and diseases

B. Composition of the Mitochondrial Genome

 Genes: Contains 13 protein-coding genes (all for ATP production), 22 tRNA genes, and
2 rRNA genes.
 Features: It's very compact, with no introns and common gene overlap. It relies on
about 1700 nuclear genes for its function.
 Control region: Includes the D-loop, a highly variable region used in forensic tracking.

C. Composition of the Nuclear Genome

The nuclear genome contains chromosomes with varying gene density (some regions are "gene
deserts," others "gene clusters").

 Protein-coding genes: Consist of exons (coding regions), introns (non-coding regions


that are removed during RNA processing), promoters (DNA sequences that initiate gene
transcription), and are often organized into gene families.
 Gene families: Groups of genes that arose from gene duplication. They can be clustered
or dispersed. Examples include globins and olfactory receptor genes. They are important
for evolution and providing functional redundancy (backup functions).
 Pseudogenes: Defective gene copies that are usually non-functional. They can arise from
gene duplication (non-processed) or from reverse-transcribed mRNA (processed, lacking
introns). The human genome has approximately 20,000 pseudogenes.

D. Non-coding RNA (ncRNA)


While only 2% of the nuclear genome codes for proteins, much more of it is transcribed
into non-coding RNA, which plays diverse regulatory roles.

 Main Classes:
o rRNAs (ribosomal RNAs): Essential for protein synthesis.
o tRNAs (transfer RNAs): Carry amino acids during protein synthesis.
 Short ncRNAs:
o snRNAs (small nuclear RNAs): Involved in RNA splicing.
o miRNAs (microRNAs):
 Definition: Small (18-25 nucleotides), non-coding RNAs that regulate
gene expression post-transcriptionally.
 Biogenesis:
1. Transcribed by RNA Pol II into a primary-miRNA (pri-miRNA).
2. Processed by the Drosha enzyme in the nucleus into a pre-
miRNA (a 70-nucleotide hairpin structure).
3. Exported to the cytoplasm by Exportin-5.
4. Further processed by the Dicer enzyme into a mature miRNA
duplex.
5. One strand (the guide strand) is loaded into the RNA-induced
silencing complex (RISC), while the other strand is degraded.
 Function: The miRNA-RISC complex binds to complementary
sequences, typically in the 3' untranslated region (3' UTR) of target
mRNAs. This binding can lead to mRNA degradation (if perfectly
complementary) or repression of translation (if partially complementary).
 Impact: One miRNA can target hundreds of mRNAs, and one mRNA can
be regulated by multiple miRNAs.
 Relevance: Dysregulated miRNAs are linked to diseases like cancer,
cardiovascular disease, and neurodegenerative disorders. They are also
used as biomarkers in blood or tissues and have therapeutic
potential (e.g., miRNA mimics or inhibitors).
 Long ncRNAs (lncRNAs):
o Definition: Non-coding RNAs longer than 200 nucleotides with low protein-
coding potential.
o Types: Include intergenic, intronic, and antisense lncRNAs.
o Function: Regulate transcription, chromatin structure, and epigenetic states.

E. Highly Repetitive DNA Elements

About half of the human genome consists of repetitive DNA.

 Heterochromatin/Tandem Repeats:
o Located in centromeres (constricted regions of chromosomes)
and telomeres (chromosome ends).
o Generally not transcribed.
o Types include satellites, minisatellites, and microsatellites.
 Transposons (Interspersed Repeats): "Jumping genes" that can move around the
genome.
o Class I (Retrotransposons): "Copy and paste" mechanism (transcription to RNA,
then reverse transcription back to DNA, then integration).
 LINEs (Long Interspersed Nuclear Elements): Autonomous (can move
themselves).
 SINEs (Short Interspersed Nuclear Elements): Non-autonomous (rely
on LINEs for movement).
o Class II (DNA Transposons): "Cut and paste" mechanism.
o Impact: Important for structural integrity and genome evolution but can be
mutagenic.

F. Gene Definition and the ENCODE Project

The definition of a "gene" has evolved over time.

 Post-ENCODE: A gene is defined as "a union of genomic sequences encoding a


coherent set of potentially overlapping functional products".
 ENCODE Project (Encyclopedia of DNA Elements): A large-scale research project
(2003-2012) that significantly redefined our understanding of genome functionality.
o Key Findings:
 80% of the human genome has biochemical functions.
 75-80% of the genome is transcribed (copied into RNA) in at least one cell
type.
 Most transcripts are non-coding RNAs.
 Protein-coding genes often produce multiple transcripts via alternative
splicing.
 DNA and histone modifications vary by cell type and affect gene
regulation.
 Disproved the notion of "junk DNA", as many non-coding regions have
regulatory roles.

3. Gene Regulation and the Epigenome

Gene regulation controls when, where, and how much a gene is expressed. This ensures that
genes are turned on or off at the right time (temporal control) and in the right place (spatial
control), leading to tissue-specific functions, developmental stages, and responses to
environmental signals.

A. Chromatin and TADs

 Chromatin: The complex of DNA, histones (proteins around which DNA is wrapped),
and other proteins.
o Euchromatin: "Open" and transcriptionally active (genes can be expressed).
o Heterochromatin: "Condensed" and transcriptionally inactive (genes are
silenced).
o Remodeling: Changes in chromatin structure can increase or decrease DNA
accessibility to promoters, thereby aiding or hindering RNA polymerase binding
and gene expression.
 TADs (Topologically Associating Domains): 3D chromatin domains that help bring
enhancers (DNA sequences that boost gene expression) close to promoters.
o Delineated by boundary proteins/elements: These act as borders between
domains and prevent the spread of heterochromatin or prevent enhancers from
activating unintended genes.
o Importance: Crucial for gene regulation and often conserved across species.
Their disruption is linked to rare diseases.

B. Levels of Gene Regulation

Gene expression is controlled at multiple levels, from changes in DNA structure to protein
processing.

 I. Transcriptional Control (in the nucleus):


1. Chromatin Remodeling:
 Changes in chromatin structure influence DNA accessibility.
 Modifications: Histones can be modified by acetylation (generally
activates genes), methylation(can activate or repress, depending on
location), and phosphorylation.
 Enzymes: "Writers" add modifications, "erasers" remove them, and
"readers" interpret them.
2. Promoter Accessibility: Promoters are DNA sequences just upstream of genes
where transcription begins.
 Core promoters: Contain key sequences like the TATA box.
 Proximal promoters: Contain regulatory elements like GC box, CCAAT-
box.
 Some promoters are bidirectional, allowing transcription in both
directions.
3. Transcription Factors (TFs): Proteins that bind to DNA and regulate gene
transcription.
 They have DNA binding domains (e.g., zinc fingers) and activation
domains.
 General TFs: Needed for all transcription.
 Specific TFs: Act as activators or repressors for particular genes, often in
a tissue-specific manner or in response to signals.
4. RNA Polymerases (RNA pol): Enzymes that synthesize RNA from a DNA
template.
 Pol I: Synthesizes rRNAs.
 Pol II: Synthesizes mRNAs, miRNAs, snRNAs, and tissue-specific
RNAs.
 Pol III: Synthesizes tRNAs and 5S rRNAs.
 There's also a mitochondrial RNA pol.
 II. Epigenetic Mechanisms: These are heritable changes in gene expression that do not
involve changes to the underlying DNA sequence.
1. DNA Methylation:
 Process: Addition of methyl groups to CpG islands (regions rich in
cytosine and guanine) in promoter regions by DNA methyltransferases.
 Effect: Silences gene expression by blocking transcription factor binding
and recruiting repressor proteins.
 Inheritance: Heritable through cell division (mitosis) but not necessarily
through generations (meiosis).
 Role: Important in development, cell lineage commitment, and genomic
imprinting.
2. Histone Modification:
 Process: Chemical modifications (e.g., acetylation, methylation,
phosphorylation) to the tails of histone proteins.
 Effect: These modifications are interpreted as a "histone code" that
defines different chromatin states. Acetylation generally activates genes,
while methylation can activate or repress depending on the specific amino
acid and degree of methylation.
 Inheritance: Also heritable through mitosis.
3. Imprinting:
 Definition: A phenomenon where gene expression depends on whether
the gene was inherited from the father or the mother. Only one copy
(allele) is expressed.
 Prevalence: Affects a small percentage of genes (1-2%) in mammals,
often found in clusters.
 Example: The IGF2 gene, where only the paternal allele is expressed.
4. X-inactivation:
 Definition: In female mammals (who have two X chromosomes), one X
chromosome is randomly inactivated in each cell to balance gene dosage
with males (who have one X).
 Control: Controlled by the XIST long ncRNA.
 Result: Leads to mosaic expression, meaning different cells in the same
organism can have different X chromosomes active, like in calico cats' fur
color.
 III. Post-Transcriptional Control (in the cytoplasm):
1. Pre-mRNA Processing: After transcription, the initial RNA (pre-mRNA)
undergoes modifications to become mature mRNA.
 RNA Splicing: Introns are removed, and exons are joined to form
mature mRNA.
 Alternative Splicing: A single gene can produce different mRNA
molecules (and thus different proteins or protein isoforms) depending on
which exons are included or excluded. This significantly increases protein
diversity and is important for tissue-specific expression (e.g., in the CNS).
 5' Capping and Poly-A Tail Addition: Modifications at the ends of the
mRNA that protect it and aid in translation.
2. mRNA Stability: How long an mRNA molecule lasts in the cytoplasm affects
how much protein can be made from it.
 Degradation: mRNAs can be degraded by enzymes (exonucleases) from
either end after modifications like the 5' cap removal or polyA tail
shortening.
 microRNAs (miRNAs): Can regulate mRNA degradation or inhibit
translation by binding to the 3' UTR of mRNA. One miRNA can target
many mRNAs.
3. Translation Control: Regulating the process of protein synthesis from mRNA.
 miRNAs: Can bind to the 3' UTR of mRNA (along with RNA-binding
proteins) to inhibit translation initiation at the ribosome or cause mRNA
degradation. This offers a faster cellular response than transcriptional
control.
4. Protein Processing: After a protein is made, it can undergo further modifications.
 Misfolded proteins are degraded.
 Covalent modifications: Like phosphorylation (addition of phosphate
groups).
 Proteolytic cleavage: Cutting the protein to generate its mature, active
form.
5. Protein Transport: Proteins need to be delivered to their correct locations within
the cell or outside of it.
 Signal sequences: Direct proteins to specific cellular destinations. This
transport is regulated and essential for proper cell function.

C. Techniques to Measure Gene Expression and Epigenetic Modifications

 At the RNA Level:


o qPCR: Measures mRNA levels of specific genes; sensitive, specific, high-
throughput.
o RNA-seq: Comprehensive transcriptome profiling; detects alternative splicing,
isoforms, novel transcripts.
o Northern Blot: Less common now; detects RNA size and abundance.
o Microarrays: Hybridization-based method to compare expression profiles by
converting RNA to labeled cDNA and hybridizing to a chip.
 At the Protein Level:
o Western Blot (immunoblotting): Detects specific proteins using antibodies;
semi-quantitative. Involves protein separation by SDS-PAGE, transfer to a
membrane, and detection with antibody probes.
o ELISA (Enzyme-Linked Immunosorbent Assay): Quantitative analysis of
protein concentration.
o Mass Spectrometry: High-resolution protein identification and mapping of post-
translational modifications (PTMs).
o Immunohistochemistry/Immunofluorescence: Spatial visualization of protein
expression directly in tissues or cells.
 For Epigenetic Modifications:
o Bisulfite Sequencing: Detects DNA methylation patterns by converting
unmethylated cytosines to uracil.
o ChIP-Seq (Chromatin Immunoprecipitation Sequencing): Measures histone
modifications and transcription factor binding sites across the genome.
o ATAC-Seq/DNase-Seq: Measures chromatin accessibility (open chromatin
regions).
o MeDIP-Seq (Methylated DNA Immunoprecipitation Sequencing): Captures
methylated DNA regions.

4. Genetic Variation and Human Molecular Pathology

Genetic variation refers to the differences in DNA sequences among individuals. These
variations are the raw material for evolution and can also cause disease.

A. Origin of Genetic Variation

Mutations, which are changes in the DNA sequence, are the ultimate source of genetic variation.

1. Replication Errors: DNA polymerase, the enzyme that copies DNA, can sometimes
mispair bases. While DNA mismatch repair systems fix many of these, some errors
persist.
2. Replication Slippage: Occurs typically in STRs (short tandem repeats), leading to
insertions or deletions (indels) of repeated units.
3. Chromosome Segregation/Recombination Errors: Mistakes during cell division
(meiosis or mitosis) can cause large-scale chromosomal changes like inversions,
deletions, translocations. For example, nondisjunction (failure of chromosomes to
separate properly) causes aneuploidy (an abnormal number of chromosomes), as seen in
Down, Turner, and Klinefelter syndromes.
4. Endogenous DNA Damage: Spontaneous damage to DNA from within the cell, such as
loss of bases, reactive oxygen species (ROS), or deamination (e.g., cytosine changing to
uracil or thymine).
5. External Mutagens: Agents from the environment that damage DNA, such as ionizing
radiation (causes DNA breaks), UV radiation (forms thymine dimers), and hydrocarbons
from smoke or pollution.

B. Types of Mutations

 Substitutions (SNVs; Single Nucleotide Variants): A single base is changed.


o Transition: Change between purines (A to G, or G to A) or between pyrimidines
(C to T, or T to C).
o Transversion: Change between a purine and a pyrimidine (e.g., A to C, or T to
G).
 Indels (Insertions or Deletions): The addition or removal of one or more base pairs. If
they occur within a coding region, they can cause frameshifts.
 Copy Number Variants (CNVs): A type of structural variant where large segments of
DNA are deleted or duplicated. Other structural variants include inversions and
translocations.
 Tandem Repeat Expansions: Increases in the number of short repetitive DNA
sequences, which can cause diseases like Huntington's.

Balanced mutations involve no net gain or loss of DNA but can still disrupt genes. Unbalanced
mutations result in a change in DNA copy number and often cause disease.

C. Consequences of Genetic Variation

Mutations can have various effects on gene function and protein production:

 Loss of Function:
1. Nonsense Mutation: Changes a codon (three-nucleotide sequence that codes for
an amino acid) into a premature stop codon, leading to a truncated (shorter) or
non-functional protein. If the stop codon occurs early in the coding region, it can
trigger nonsense-mediated mRNA decay (a quality control mechanism that
degrades faulty mRNA).
2. Missense Mutation: Changes a single amino acid in the protein.
 Synonymous: No amino acid change (due to redundancy in the genetic
code), usually low risk.
 Tolerated: Chemically similar amino acid substitution, often mild or no
effect.
 Not Tolerated: Major disruption in protein function, high risk.
3. Splice Site Mutation: Occurs at the boundaries between introns and exons,
disrupting the normal splicing process. This can lead to exon skipping, the
creation of new exons, or enlarged exons.
4. Frameshift Mutations: Caused by indels (insertions or deletions not in multiples
of three) that shift the reading frame of the gene. This usually leads to an early
stop codon and a truncated or non-functional protein.
 Gain of Function:
1. Structural Rearrangements: Can lead to chimeric genes (combinations of parts
from two or more distinct genes) or ectopic expression (a gene being expressed
in a cell or tissue where it's not normally active). For example, enhancers
(regulatory DNA elements) might be placed next to oncogenes, leading to their
overexpression.
2. RNA Toxicity: Usually results from unstable repeat expansions where
abnormally long RNA transcripts are produced. These long RNAs can form stable
structures (hairpins) that trap RNA-binding proteins, disrupting the processing of
other genes (e.g., by causing mis-splicing). Seen in diseases like myotonic
dystrophy.
3. Unstable Repeat Expansions: Trinucleotide or multibase repeats expand beyond
a certain threshold.
4. Gene Duplications: Lead to more transcripts and potentially more protein, as
seen with IGF2 duplication.
5. Missense Mutations: Can change the protein's function, sometimes creating
dominant oncogenes.

D. DNA Repair Mechanisms

Cells have sophisticated systems to fix DNA damage and replication errors.

 Base-Excision Repair: Fixes modified bases (e.g., deaminated C to U).


 Nucleotide-Excision Repair: Removes bulky lesions (e.g., UV damage).
 Mismatch Repair: Corrects replication errors where incorrect bases are incorporated.
 Nonsense-Mediated Decay: Degrades mRNA transcripts that contain premature stop
codons.

E. Population Genetic Variation

 Negative (Purifying) Selection: Removes harmful alleles (gene variants) from a


population, leading to conserved regions of the genome that are essential for survival.
This can result in Runs of Homozygosity (ROH), long stretches where both
chromosome copies are identical, indicating inbreeding or population bottlenecks.
 Positive (Adaptive) Selection: Favors beneficial mutations that provide an advantage,
leading to selective sweeps where a beneficial allele rapidly increases in frequency,
reducing heterozygosity in that region.

F. Consequences for Phenotype

 Neutral: No change in observable characteristics (phenotype), applies to most SNVs.


 Deleterious: Cause diseases or loss of function, often promoted by inbreeding.
 Beneficial: Drive evolution and adaptation, can lead to selective sweeps.
 Structural: Alter chromatin structure or TADs, affecting enhancer-promoter interactions.

G. Chromosomal Abnormalities & Structural Variations

 Euploidy: Having a complete set of chromosomes (e.g., 46 in humans).


 Aneuploidy: Having a missing or extra individual chromosome (e.g., monosomy,
trisomy like Down syndrome).
 Structural Abnormalities:
o Balanced: No net gain or loss of genetic material (e.g., inversions, reciprocal
translocations).
o Unbalanced: Gain or loss of genetic material (e.g., deletions, duplications,
Robertsonian translocations).
 Chromosomal Translocations: Swapping of material between non-homologous
chromosomes.
o Reciprocal: Exchange of segments between two chromosomes.
o Robertsonian: Fusion of two acrocentric chromosomes (chromosomes with
centromeres near one end) short arms.
 TADs (Topologically Associating Domains): 3D genome units that restrict enhancer-
promoter activity. Disruption of TADs can lead to ectopic gene expression, cancer, and
developmental disorders. They are studied with Hi-C sequencing.

H. Techniques to Study Chromosomal Arrangements

1. Karyotyping: Visualizing the full set of chromosomes in a cell. Often uses G banding to
detect large-scale changes. Useful for balanced abnormalities.
2. Chromosome Painting: Uses fluorescently labeled DNA probes that bind to specific
chromosomes, revealing translocations or fusions.
3. FISH (Fluorescence In Situ Hybridization): Uses fluorescently dyed DNA probes to
bind to specific DNA sequences on chromosomes. Detects translocations, inversions,
duplications, deletions, and CNVs.
4. Comparative Mapping: Aligning genes or markers across species to identify conserved
synteny (blocks of genes with the same order). Uses genome browsers, sequence
alignments (like BLAST), and sequencing data.
5. CGH Arrays (Comparative Genomic Hybridization arrays): Detects copy number
differences below 5 Mb (megabases).
6. Whole Genome Sequencing (WGS): Can detect SNVs, indels, and structural variations.
7. Hi-C-seq: Maps 3D chromatin interactions, useful for studying TADs.

I. Comparative Genomics

This field studies the genetic similarities and differences across species.

 Purpose: To identify conserved genes and regulatory elements, understand genome


evolution and function, and select optimal model organisms.
 Purifying/Negative Selection: Removes harmful mutations, preserving functionally
important sequences (both coding and non-coding). This is measured by the Ka/Ks
ratio (ratio of non-synonymous to synonymous substitutions), where Ka/Ks < 1 indicates
purifying selection.
 Positive/Adaptive Selection: Favors beneficial mutations, leading to lineage-specific
adaptations (e.g., olfactory genes in dogs). Ka/Ks > 1 indicates positive selection.
 G-value Paradox: The observation that there's no direct link between the number of
genes and an organism's complexity. Complexity often comes from sophisticated gene
regulation, not just gene count (e.g., humans have ~20,000 genes, wheat ~100,000).
 Gene Duplication: A major driver of genome evolution, generating paralogs (within a
species) and orthologs(across species). Can lead to new functions
(neofunctionalization), redundant backups (subfunctionalization), and increased
complexity.
 Genome Evolution: Also includes exon duplication/shuffling (rearranging exons to
create new genes), expansion of noncoding regulatory regions (especially in vertebrates),
and lineage-specific changes.

J. Ancient DNA (aDNA)


 Sources: Best preserved in cold (permafrost), dry (deserts, mummies), or stable pH
(caves) environments. Poorly preserved in hot, humid, acidic, or basic conditions.
 Degradation & Damage: aDNA is typically fragmented and suffers chemical changes
(like cytosine deamination).
 Challenges: Contamination from environment, lab, or humans is a major concern.
 Good Practice: Sterile collection, dedicated clean labs, and strict anti-contamination
protocols are essential.
 Ancient Proteins/RNA: Proteins are more stable than DNA and can persist longer.
Ancient RNA, though historically thought too fragile, has been found in some specimens,
providing insight into ancient gene expression.

K. Ethics in Genetics

The sources highlight ethical considerations in various areas:

 Dog breeding.
 Genetic testing of humans.
 Gene editing in humans and animals.

5. Mapping Mendelian Characters

Mendelian characters (or traits) are those whose inheritance patterns can be explained by the
laws of Mendelian genetics, typically involving a single gene.

A. Basic Concepts in Mendelian Genetics

 Heritability: The proportion of phenotypic variation (observable differences) in a


population that is due to genetic variation among individuals. It's population-specific and
doesn't predict individual outcomes.
 Mendelian Patterns of Inheritance: Include autosomal dominant/recessive, X-linked
dominant/recessive, Y-linked, and mitochondrial inheritance (from the mother).
 Mendel's Laws:
1. Law of Segregation: Each parent passes one of two alleles (different versions of
a gene) for a trait randomly to their offspring.
2. Law of Independent Assortment: Alleles for different genes segregate
(separate) independently during gamete formation.
3. Law of Dominance: Dominant alleles mask the effect of recessive ones in
heterozygotes (individuals with two different alleles).
 Forces that Change Allele Frequency: Mutation, gene flow (migration), genetic drift
(random changes), natural selection, and non-random mating.
 Hardy-Weinberg Equations: Used to describe allele and genotype frequencies in a
stable population:

o Allele frequency: p + q = 1 (p = frequency of dominant allele, q = frequency of


recessive allele).
o Genotype frequency: p² + 2pq + q² = 1 (p² = homozygous dominant, 2pq =
heterozygous, q² = homozygous recessive).

B. Genetic Heterogeneity

This refers to situations where different genetic causes lead to the same phenotype (observable
trait).

1. Allelic Heterogeneity: Different mutations within the same gene cause the same
phenotype (e.g., over 12 mutations in the CFTR gene cause cystic fibrosis).
2. Locus Heterogeneity: Mutations in different genes cause the same phenotype (e.g.,
retinitis pigmentosa from mutations in over 16 genes).
3. Clinical Heterogeneity: Mutations in the same gene can lead to different phenotypes
(e.g., different dystrophinmutations cause Duchenne or Becker muscular dystrophy).

C. Other Key Terms

 Haplotype: A combination of alleles at adjacent loci (locations) on a chromosome that


are inherited together. Useful for tracking inheritance and identifying disease regions.
 Genetic Distance vs. Physical Distance:
o Genetic distance: Measured in centimorgans (cM), reflecting
the recombination frequency between genes.
o Physical distance: Measured in base pairs (bp).
o Recombination is not evenly distributed across the genome (there are hotspots),
and sex can influence genetic maps.
 Penetrance: The percentage of individuals with a particular gene who express its
associated phenotype.
 Codominance: Both alleles are fully expressed in a heterozygote.
 Incomplete Dominance: Heterozygotes show an intermediate phenotype.
 Variable Expressivity: The degree or intensity to which a gene is expressed can vary
among individuals.
 Epistasis: One gene masks the effect of another gene.
 Pleiotropy: One gene affects many different traits.
 Recombination: The exchange of genetic material between homologous chromosomes
during meiosis. This is the basis for estimating distances between markers on
chromosomes via genetic maps.
 Linkage Phase: The arrangement of alleles on parental chromosomes (coupling:
dominant alleles linked; repulsion: dominant linked to recessive).
 Informative Meiosis: A meiosis event where recombination can be determined, crucial
for linkage analysis.

D. Genetic Mapping

 Purpose: To locate genes responsible for traits (e.g., disease loci), determine the relative
positions of genes/markers on chromosomes, understand recombination patterns, and aid
breeding programs.
 Genetic Markers: Polymorphic (variable) DNA sequences used to trace inheritance.
Types include SNPs, microsatellites/STRs, and RFLPs. They should be polymorphic,
stable, and easy to genotype.
 Genetic Maps: Show the relative positions of markers based on recombination rates. 1
cM (centimorgan) equals 1% recombination frequency.
 Fine Mapping: High-resolution mapping to narrow down the location of a disease-
causing mutation using dense marker panels and recombination events. Examples include
GWAS (Genome-Wide Association Studies) and linkage analysis.

E. Linkage Mapping

 Main Concepts:
o Identifies disease loci by examining the inheritance of traits with nearby genetic
markers.
o Based on the principle that loci located close together on the same chromosome
tend to be inherited together because recombination between them is less likely.
o Unlinked genes: On different chromosomes or far apart on the same
chromosome; they assort independently.
o Linked genes: Close together on the same chromosome; usually inherited
together, with a lower recombination frequency.
o Recombination frequency: The percentage of offspring with recombinant
genotypes. A low percentage indicates loci are close together. The maximum
recombination frequency for unlinked loci is 50%.
 LOD Score (Logarithm of Odds): A statistical measure used to estimate the likelihood
of linkage versus no linkage.
o Formula: log₁₀ (likelihood of linkage / likelihood of no linkage).
o Interpretation:
 LOD > 3: Significant evidence for linkage (odds of 1000:1 in favor of
linkage).
 LOD < -2: Evidence against linkage.
 Values in between are inconclusive.
o The peak of a LOD score curve indicates the most probable recombination
frequency.
 Limitations: Requires large, informative families. Less useful for complex traits. Needs
clear phenotype classification and has limited resolution.

F. Identification of Mutations for Mendelian Traits by Whole Genome Sequencing (WGS)

WGS is considered the "gold standard" for identifying causative mutations in monogenic
(single-gene) traits.

 Analysis of NGS Data: NGS technologies (like Illumina HiSeq) produce billions of short
sequence reads, resulting in massive datasets. This requires automated pipelines, high-
performance computing, and systematic data processing.
 Typical Workflow (Pipeline):
1. Quality Control: Raw sequence reads are trimmed, and adapters are removed to
eliminate sequencing errors.
2. Alignment: Reads are aligned (mapped) to a reference genome to identify their
original location. This creates SAM/BAM files.
3. Variant Detection: Genetic variants (like SNPs and indels) are identified by
comparing the aligned reads to the reference genome.
4. Variant Annotation and Filtering: Variants are annotated with their potential
functional consequences, and irrelevant variants are filtered out to prioritize those
likely to affect gene function or cause disease.
 Data Handling: Large datasets are processed using multi-node computer clusters or
cloud platforms for parallel processing and distributed storage.
 Translating Data to Biological Insight: The ultimate goal is to identify a single
causative mutation or meaningful pattern. For example, in the case of PRA (Progressive
Retinal Atrophy) in dogs, filtering millions of variants led to identifying one causative
mutation, allowing for a genetic test to be developed.
 WGS Workflow (Detailed):
1. Start with genomic DNA.
2. Sonication: DNA is fragmented into smaller pieces using sound waves.
3. End Repair: Fragment ends are prepared (made blunt or compatible).
4. Adapter Ligation: Sequencing adapters are attached to the DNA fragments.
5. PCR Amplification (optional, depending on sequencing platform).
6. Size Selection: DNA fragments of appropriate size are selected.
7. Sequencing.

6. Mapping Genes for Complex Phenotypes

Complex phenotypes/diseases are those influenced by multiple genetic and environmental


factors. Examples include obesity, type 2 diabetes, and heart disease.

A. Purpose and Importance

 Purpose: To identify genetic loci (regions) that contribute to phenotypic variation in


these multifactorial traits.
 Goals: Detect QTLs (quantitative trait loci) and common SNPs associated with
diseases. It complements Mendelian approaches by targeting non-Mendelian traits.
 Importance: Helps understand disease mechanisms, predict risks (for stratified
medicine), guide animal breeding strategies, and identify drug targets.

B. Key Concepts

 Multifactorial Inheritance: Traits influenced by many genes plus environmental factors,


leading to continuous variation (e.g., blood pressure, height).
 QTL (Quantitative Trait Loci): Regions of the genome associated with variation in a
quantitative trait (traits measured on a continuous scale). Such traits are often controlled
by many loci, each with small additive effects. Studied by linkage and association
mapping.
 Family Clustering: Complex traits often show family clustering, suggesting a genetic
component, but shared environment must also be considered (e.g., through twin or
adoption studies).
 Continuous Phenotypes: Unlike Mendelian traits that often fall into clear diagnostic
categories, complex phenotypes can be continuous, making classification harder.

C. Example: Obesity

 Definition: Abnormal or excessive fat accumulation that impairs health, often classified
by BMI (Body Mass Index).
 Causes (Multifactorial): Genetic predisposition interacts with environmental/lifestyle
factors (diet, activity), hormonal/metabolic dysregulation, altered gut-brain
axis/microbiome, and neuroendocrine regulation (e.g., leptin, ghrelin).
 Consequences (Comorbidities): Type 2 diabetes, non-alcoholic fatty liver disease
(NAFLD), cardiovascular disease, cancer, etc..
 Treatments:
1. Lifestyle Changes: Diet, physical activity, behavioral therapy (first line, but high
relapse).
2. Pharmacotherapy: Drugs like GLP-1 receptor agonists (e.g., semaglutide) to
increase satiety and delay gastric emptying.
3. Surgery: For severe cases (e.g., bariatric procedures like gastric bypass).
 Preclinical Drug Discovery:

o Involves identifying target genes, confirming that modifying them affects disease
outcomes, finding molecules that interact with the gene, optimizing potency, and
then assessing safety and efficacy in animal models before human clinical trials.
o Animal Models: Pigs (like the Gottingen Minipig) are good models for obesity
due to their similar metabolic profile to humans.
o Case Study (Gottingen Minipig Model): Demonstrated that semaglutide reduced
food intake and weight gain while preserving lean mass and maintaining energy
expenditure.

D. Linkage Disequilibrium (LD)

 Definition: The non-random association of alleles at different loci (gene locations) in a


population. If alleles occur together more or less often than expected by chance, they are
in LD.
 Importance in GWAS: GWAS (Genome-Wide Association Studies) detect SNPs that
are in LD with a causal variant, not necessarily the causal variant itself. LD patterns
vary by species, breed, and population history.
 Measures: D' and r² are used to quantify LD.
 Decay: LD decays over time due to recombination. The extent of decay affects the
resolution of association studies.
 Population Effects:
o Bottleneck populations (those that went through a severe reduction in size):
Have limited recombination over generations, leading to long haploblocks (large
stretches of DNA inherited together) and requiring smaller sample sizes for
GWAS.
o Humans: Have shorter LD blocks, requiring many SNP markers for GWAS.
o Cross-breeding: Can be used to increase recombination and narrow down
candidate regions.

E. GWAS (Genome-Wide Association Studies)

 Principle: Scans the entire genome to identify genetic variants (typically SNPs) that
are associated (co-occur) with a particular phenotype or disease. It relies on LD to find
SNPs linked to disease-causing variants.
 Causes of Association: Direct causation, natural selection, epistatic effects (gene-gene
interaction), population stratification (differences in allele frequency due to ancestry), or
Type 1 error (false positives).
 Process: Uses SNP markers, assuming that a disease-causing allele is inherited along
with neighboring alleles in haploblocks.
 The "Hidden Heritability" Problem: GWAS often explains only a small proportion of
the heritability estimated from family studies.
o Possible causes: Small effect sizes missed due to lack of statistical power, gene-
gene or gene-environment interactions, structural or rare variants not captured, or
epigenetic/non-additive genetic architecture.
 Multiple Testing Problem: Testing millions of SNPs increases the chance of false
positives (Type 1 error).
o Correction: Statistical corrections like Bonferroni correction (very
conservative), permutation tests, or False Discovery Rate (FDR) control are
used. P-values are used to create Manhattan plots to highlight significant
associations.
 Limitations:
o Doesn't directly identify the causal variant.
o Less power to detect rare variants, epistasis, or environmental interactions.
o Often identifies variants in non-coding regions, making biological interpretation
difficult.
o Can be confounded by population stratification and challenges in defining
phenotypes.
 Outcome: Identification of significant SNPs associated with a phenotype, visualized in
Manhattan plots.
 Follow-up Steps: Identify haplotype blocks, sequence candidate regions to find causal
variants, and conduct functional studies (e.g., using TaqMan validation) to determine
biological effects.
 Example (MMVD in Cavalier King Charles Spaniels): GWAS identified SNPs in LD
with a causal variant, and subsequent genotyping pinpointed a HYAL4 variant associated
with the disease.

7. Biomarkers
A biomarker is a measurable characteristic that indicates normal biological processes,
pathogenic (disease) processes, or a pharmacological response to a treatment.

A. Types of Biomarkers

1. Diagnostic: Defines the presence or type of a disease (e.g., PSA for prostate cancer).
2. Prognostic: Predicts disease outcome (e.g., HER2 in breast cancer).
3. Predictive: Indicates the likelihood of response to a specific treatment.
4. Mechanistic: Provides insight into the molecular mechanisms of a disease or drug.
5. Safety: Indicates adverse or toxic responses to a drug.

B. Examples of Biomarkers

 Physiological: Body temperature, blood pressure, heart rate.


 Biochemical: Proteins (e.g., PSA), hormones (e.g., insulin), carbohydrates (e.g.,
glucose), lipids (e.g., cholesterol), nucleic acids (e.g., DNA, mRNA, miRNA),
epigenetic markers (e.g., DNA methylation), metabolites (e.g., creatinine).

C. Personalized Medicine and Companion Diagnostics

 Personalized Medicine: An approach to tailor medical treatment to the individual


characteristics of patients.
 Challenges in Pharmacotherapy:
o Disease heterogeneity: Different molecular mechanisms in patients with the
same diagnosis.
o "Blockbuster drug" model: "One size fits all" drugs often fail in large
subgroups.
o High failure rates in drug development (only ~5% success from Phase 1 to
approval).
o Over- and mistreatment due to a lack of predictive biomarkers.
o Unknown disease mechanisms make targeted drug design difficult.
 Companion Diagnostic (CDx): A test developed alongside a therapeutic drug to identify
patients likely to benefit from the drug, minimize adverse effects, and guide dose
optimization.
o Characteristics: Often expensive, minimal efficacy in biomarker-negative
patients, high benefit in stratified subgroups, and crucial for avoiding treatment
failure.

D. MicroRNAs as Biomarkers

 Main Features: Small, non-coding RNAs (18-25 nt) that regulate gene expression by
binding to target mRNA at the 3' UTR, causing degradation or translational repression.
They are highly conserved across species.
 Biogenesis (as explained in Section 2.D): Transcription to pri-miRNA, Drosha
processing to pre-miRNA, export to cytoplasm, Dicer processing to mature miRNA
duplex, RISC loading, and then function by targeting mRNA.
 Functions/Circulating miRNAs:
o Regulate key processes in development, immune response, inflammation, and
cancer.
o Circulating miRNAs are highly stable extracellularly (e.g., in blood, urine,
saliva, CSF) because they are protected in vesicles (exosomes) or bound to
proteins. This stability makes them useful as non-invasive biomarkers (e.g.,
miRNA-21 in breast cancer).
 Therapeutic Use:
o miRNA mimics: Enhance the expression of downregulated miRNAs.
o AntagomiRs/anti-miRs: Inhibit overexpressed miRNAs.
o Clinical trials are ongoing for miRNA-targeted therapies in various diseases.
o Examples: miRNA-21 upregulation linked to trastuzumab resistance in breast
cancer; miRNA-27a associated with GI cancer progression in dogs.

E. Techniques to Detect miRNAs

1. RT-qPCR: The "gold standard" for miRNA detection; sensitive, quantitative, but
challenging due to miRNA's short length and lack of a polyA tail.
2. Small RNA-seq: Provides global, unbiased profiling but is less sensitive than qPCR for
low-abundance miRNAs.
3. ISH (In situ hybridization): Used for spatial localization of miRNAs in tissues, often
with LNA (locked nucleic acid) probes.
4. Microarrays: High-throughput, but less common now due to RNA-seq.
5. Digital PCR: Very quantitative and useful for low RNA input scenarios.

8. Cancer Genetics and Genomics

Cancer is a collection of related diseases characterized by the accumulation of somatic


mutations (mutations acquired in body cells, not inherited). It represents a failure of normal
growth control, where cells gain the ability to bypass proliferation signals, evade cell death
(apoptosis), invade tissues, and metastasize (spread).

A. Evolution of Cancer

Cancer follows a Darwinian selection model at the cellular level: Random mutations arise, and
those that give cells a growth or survival advantage become dominant. It's a multistage,
multistep process, moving from normal cells to abnormal proliferation (dysplasia) and then to
invasive cancer.

B. Hallmarks of Cancer

These are capabilities that all cancers must acquire to grow and spread:

1. Self-sufficiency in growth signals: Cancer cells don't need external signals to grow.
2. Insensitivity to anti-growth signals: They ignore signals that tell normal cells to stop
growing.
3. Evasion of apoptosis: They avoid programmed cell death, even when damaged.
4. Limitless replicative potential: They can divide indefinitely.
5. Sustained angiogenesis: They stimulate the formation of new blood vessels to supply
nutrients and oxygen.
6. Tissue invasion and metastasis: They can spread from their original location to other
parts of the body.

C. Causes of Mutation in Cancer

 Environmental mutagens (e.g., carcinogens).


 Errors in DNA replication.
 Familial predisposition (inherited mutations).
 Epigenetic dysregulation: Changes in gene expression without altering the DNA
sequence itself, often involving cancer cells silencing tumor suppressors or activating
oncogenes.

D. Therapeutic Targets Beyond Mutations

Cancer therapies are increasingly targeting non-mutational mechanisms:

 Epigenetic alterations: DNA methylation and histone modifications are reversible, and
drugs targeting them can reactivate silenced tumor suppressors or silence activated
oncogenes.
 Stress-induced adaptations: Cancer cells can adapt to hostile conditions (e.g., acidosis,
hypoxia) and therapy stress without new mutations, enhancing their survival.
 lncRNAs (long non-coding RNAs): Can regulate chromatin, transcription, and cell fate,
playing a role in cancer initiation (e.g., LINC00673 in pancreatic models).

E. Genes Involved in Cancer

 TP53 and the p53 protein:


o Function: p53 is a transcription factor known as the "guardian of the
genome". It preserves genome integrity by coordinating cellular responses to
DNA damage, oxidative stress, hypoxia, and oncogene activation.
o Mechanism: p53 prevents uncontrolled proliferation by:
1. Cell cycle arrest (especially at the G1/S checkpoint), giving cells time to
repair DNA.
2. Upregulating DNA repair genes (e.g., GADD45).
3. Initiating apoptosis if DNA damage cannot be repaired, preventing
malignant cells from forming.
o Loss of function: A missense mutation in TP53 leads to continued replication of
damaged DNA. This is associated with Li-Fraumeni syndrome, an inherited
condition causing multiple tumors early in life.
 Oncogenes:
o Origin: Derived from proto-oncogenes, which are normal genes involved in cell
growth and proliferation.
o Activation: Become oncogenes when mutated or dysregulated, resulting in a gain
of function (promoting uncontrolled cell growth).
o Mechanisms of activation:

 Point mutations: Single nucleotide changes that lead to constant "growth


signals" without external stimuli.
 Gene amplification: Multiple copies of a gene are produced, leading to
excessive protein production (e.g., MYCN in neuroblastoma).
 Chromosomal translocations: Rearrangements where parts of two
different chromosomes break and rejoin incorrectly, creating fused
genes with new functions (e.g., BCR-ABL fusion).
 Enhancer hijacking: An enhancer (a DNA sequence that boosts gene
expression) from one gene is misplaced next to a proto-oncogene, causing
its overexpression.

 Tumor Suppressor Genes:


o Normal function: Restrain cell proliferation, repair damaged DNA, and promote
apoptosis.
o Silencing mechanisms (resulting in loss of function):

 Mutations: Nonsense, missense, frameshift mutations (e.g., in TP53).


 Promoter hypermethylation: Excessive methylation of CpG islands in
the promoter regions of tumor suppressor genes, leading to their
epigenetic silencing.
 Loss of Heterozygosity (LOH): If one copy (allele) of a tumor suppressor
gene is already mutated, the second healthy copy can be lost through
deletion, recombination, or other mechanisms during mitosis.

o Knudson's "Two-Hit Hypothesis": Often, biallelic inactivation (both copies of


the gene being inactivated) is required for a tumor suppressor gene to fully lose its
function and contribute to cancer.

F. Genomic Instability

Cancer cells often exhibit genomic instability, meaning their genomes are unstable, leading to
high mutation rates, chromosomal rearrangements, and defects in DNA repair pathways.

G. Cell Plasticity in Cancer

Cell plasticity is the ability of differentiated cells to change their identity, dedifferentiate (go
back to a less specialized state), or transdifferentiate (change into another cell type) without
needing mutations, typically under stress or injury.

 Role in Cancer:
o Enables cancer cells to adapt to therapies and hostile environments.
o Promotes metastasis (spread of cancer) via EMT (Epithelial-Mesenchymal
Transition), a process where epithelial cells lose their cell-cell adhesion and gain
migratory properties.
o Facilitates drug resistance even without new mutations.
 Pancreatic Cancer Example: Acinar-to-ductal metaplasia (ADM), where enzyme-
producing acinar cells become duct-like cells, can be driven by injury and oncogenic
KRAS mutations. This ADM can be a route to Pancreatic Ductal Adenocarcinoma
(PDAC). Transcription factors like Sox9 and Sox4 are involved in plasticity and
neoplastic transformation.

H. Tumor Microenvironment (TME)

The TME is the complex environment surrounding a tumor, consisting of various cell types,
extracellular matrix, and signaling molecules. It profoundly shapes tumor behavior, progression
(angiogenesis, metastasis), and resistance to treatment.

 Components:
o CAFs (Cancer-Associated Fibroblasts): Secrete growth factors, cytokines, and
extracellular matrix components that support tumor growth and immune evasion.
o Immune cells: Certain immune cells (e.g., tumor-associated macrophages,
myeloid-derived suppressor cells) can suppress anti-tumor immune responses.
o Endothelial cells: Support angiogenesis (formation of new blood vessels) to feed
the tumor.
o Extracellular Matrix (ECM): Provides structural support and biochemical
signals.
 Pancreatic Cancer TME: Characterized by a dense stroma and an immunosuppressive
microenvironment, posing significant obstacles to treatment.

I. Aging and Environment

 Pancreas regeneration declines with age.


 Acidosis (an acidic tumor environment) can enhance metastatic potential by promoting
stem-like traits in cancer cells.

J. Cancer Therapies: Personalized Medicine

 Concept: Uses an individual patient's genetic and molecular tumor profiles to tailor
treatment.
 Inherited Mutations: Mutations in key tumor suppressor genes can lead to hereditary
cancer syndromes (e.g., BRCA1/2 for breast/ovarian cancer, TP53 for Li-Fraumeni, APC
for familial adenomatous polyposis).
 Examples of Targeted Therapies:
o BCR-ABL fusion (in chronic myeloid leukemia) treated with Imatinib (a
tyrosine kinase inhibitor).
o HER2 amplification (in breast cancer) treated with Trastuzumab.
 Requirements: Requires integrating data from whole-genome/exome sequencing,
transcriptomics (RNA-seq), and epigenetic profiling.
 Advantages: Higher efficacy, fewer side effects, and targeting the "driver mutations"
(those that cause cancer progression) rather than just symptoms.

K. Techniques to Study Cancer at the Molecular Level

 PCR-based Methods: For mutation detection (e.g., TaqMan for SNPs) and in veterinary
genetic testing (e.g., NPHP4 mutation in dogs).
 Whole Genome Sequencing (WGS) / Whole Exome Sequencing (WES): Detects point
mutations, indels, and structural variants, helping identify driver vs. passenger mutations.
 RNA-seq: Measures gene expression and identifies gene fusions and splice variants.
 Epigenetic Assays:
o Methylation analysis: Bisulfite sequencing, MeDIP-seq.
o ChIP-seq: Maps histone modifications.
 Reverse Phase Protein Arrays (RPPA): Profiles protein and phosphorylation levels.
 Fluorescence-based genotyping (Microsatellites/STRs): Used in paternity testing and
population genetics.
 Multiplex PCR Panels: Co-amplify multiple microsatellites in one reaction, used in
paternity and forensics.
 Next-Generation Sequencing in Forensics: Uses STRs and SNPs for identity, ancestry,
and age estimation (e.g., with Ion Torrent and Illumina platforms).

9. Genetic Manipulation of Mammalian Cells

Genetic manipulation involves inserting, deleting, or altering genes in organisms.

A. Concepts and Principles

 Purpose: To understand gene function, model human diseases (e.g., Alzheimer's in


mice), understand gene roles in development, and produce therapeutic proteins (e.g.,
insulin).
 Gene Therapy: Aims to correct genetic disorders by modifying gene expression in
affected cells.

B. Different Animal Models

Model Organism Pros Cons


Evolutionarily distant from
Flies (Drosophila), Cheap, short life cycles, many offspring,
humans, poor physiological
C. elegans good for genetics
relevance
Transparent embryos, useful for early Evolutionarily distant from
Frogs, Zebrafish
developmental studies humans
Differences in complex organs
Mammalian, many genetic tools, short
Mice (e.g., brain), relatively short
lifecycles, established disease models
lifespan
Close physiological similarity to humans,
Dogs, Pigs, Non- Expensive, ethical concerns, long
similar behavior, immunity, neurological
human Primates generation times, few offspring
studies

C. Transgenic Animals and Chimeras

 Transgenic Animals: Animals that have been genetically modified to carry foreign
DNA (a "transgene") integrated into their own genome, and this transgene is passed on to
their offspring.
 Chimeric Organism: An organism made from cells derived from two or more different
zygotes (early embryos).

D. How to Generate Transgenic Animals

There are several methods to introduce foreign DNA into an animal:

1. Insertion into Germ Cells: This ensures the genetic modification is present in all cells of
the animal, including its reproductive cells, so it can be passed to the next generation.
o Pronuclear Injection: DNA is directly injected into the pronucleus (the haploid
nucleus of the sperm or egg) of a fertilized oocyte (egg cell) before the sperm and
egg nuclei fuse.
o Gene Transfer into Early Embryos or Gametes: DNA is introduced at very
early embryonic stages or into the gametes themselves.
o Somatic Cell Nuclear Transfer (SCNT): The nucleus from a modified adult cell
is transferred into an enucleated (nucleus-removed) egg cell. This was the
technique used to clone Dolly the sheep.
2. Gene Targeting (Precise Modification): Uses homologous recombination to introduce
a specific mutation or gene insertion at a desired location in the genome. This is done
using a plasmid (or other DNA template) with "homology arms" that match the target
gene sequence. This allows for knockout (inactivating a gene) or knockin (inserting a
gene or specific mutation).
o Backcrossing: After initial modification, animals are often backcrossed with
wild-type animals many times to ensure that the only genetic difference is the
targeted gene modification.
o Key Protein-based Techniques for Gene Editing: These all create a double-
strand break (DSB) at a specific DNA target site, which the cell then repairs.
 Zinc Finger Nucleases (ZFNs): Engineered proteins with DNA-binding
domains (zinc fingers) that recognize specific 3-bp DNA sequences and a
nuclease domain (FokI) that cuts DNA. Two ZFNs bind on opposite sides
of the target site, and their FokI domains dimerize to make a DSB.
 TALENs (Transcription Activator-Like Effector Nucleases): Similar to
ZFNs, but the DNA-binding domains are derived from bacterial proteins
(TAL effectors). Two TALENs bind to sequences, and their FokI domains
dimerize to create a DSB in the spacer region between their binding sites.
Used in CAR T-cell therapy.
 CRISPR-Cas9 (Clustered Regularly Interspaced Short Palindromic
Repeats-CRISPR-associated protein 9): This system uses a guide RNA
(gRNA) that directs the Cas9 nuclease enzyme to a specific DNA
sequence, which must be next to a PAM (Protospacer Adjacent
Motif)sequence. Cas9 then cuts the DNA, creating a DSB. CRISPR can be
used for selective reproduction to introduce or remove specific traits.
o DNA Repair Pathways after DSB:
 NHEJ (Non-Homologous End Joining): An error-prone repair pathway
that often results in small insertions or deletions (indels) at the repair site.
This is commonly used for gene knockout(inactivating a gene).
 HDR (Homology-Directed Repair): A precise repair pathway that
requires a donor DNA templatewith homology arms (sequences matching
the target region). This pathway allows for precise gene insertion
(knockin) or correction of specific mutations.
o Conditional Gene Editing Systems (Cre-lox Recombination): Allows for
controlling where (tissue-specific) and when (time-specific) a gene is modified.
This is useful for studying gene function without unwanted effects during
development.
 Cre recombinase: An enzyme that recognizes and cuts at loxP sites (short
DNA regions that flank the target gene). If Cre is introduced with a tissue-
specific or inducible promoter, it will only excise or invert the DNA
between the loxP sites in specific cells or at specific times, leading to a
conditional gene knockout.
3. Random Mutagenesis: DNA is inserted randomly into the genome. This method offers
less control and can unintentionally disrupt other genes.

E. Applications of Transgenic Animals

 Disease modeling (e.g., Alzheimer's, Huntington's).


 Functional genomics (studying gene function and regulatory regions).
 Pharmaceutical production (e.g., producing monoclonal antibodies in genetically
engineered CHO cells).
 Xenotransplantation (using animal organs for human transplantation).
 Vaccine development.

F. Gene Knockout, Knockin, and Knockdown

 Knockout: A gene is completely inactivated, resulting in a loss of function. Often


achieved via NHEJ, by inserting a reporter gene (e.g., LacZ) or a selection marker to
disrupt critical exons. Haploinsufficiency means that sometimes knocking out only one
allele might still lead to a partial or no phenotype if the remaining allele is sufficient.
 Knockin: A gene or specific mutation is inserted at a specific genomic locus, resulting in
a gain of function. Achieved via HDR, often used to insert a reporter (like LacZ) to track
gene expression under a known promoter.
 Knockdown: A partial reduction in gene expression, usually achieved using RNA
interference (RNAi) or shRNA (short hairpin RNA).
G. Genetic Approaches to Treat Disease

1. Production of Reagents:
o Therapeutic Proteins: Genetically engineered organisms (e.g., CHO cells) are
used to produce proteins like insulin or monoclonal antibodies. This requires
precise control of expression, purification, and post-translational modifications.
o Vaccines:
 Engineered viral vectors: Viruses (e.g., VSV for Ebola vaccine) are
genetically engineered to carry antigens.
 mRNA vaccines: Use synthetic mRNA that codes for a specific antigen
(e.g., spike protein in COVID-19 vaccines).
 DNA vaccines: Deliver plasmids into host cells to express antigens.
o Gene Therapy Products: Products like Casgevy for sickle cell disease involve
gene modification.
2. Cell Therapy: Uses intact cells to treat diseases.
o Applications: iPSC-derived neurons for neurodegenerative diseases,
hematopoietic stem cell transplants, regenerative repair (e.g., spinal injury, retinal
degeneration).
o Limitations/Risks: Tumor risk (especially with pluripotent stem cells like
iPSCs), immune rejection in allogeneic transplants, poor control over
differentiation leading to wrong tissue formation, ethical issues with embryonic
stem cells, and difficulty in effective delivery to damaged tissues.
o CAR T-cells (Chimeric Antigen Receptor T-cells): A type of immunotherapy
for cancer.
 Process: T cells are extracted from a patient, modified (e.g., with a viral
vector) to express an artificial receptor (CAR19) that targets CD19-
positive cancer cells, and then infused back into the patient.
 Donor T-cells: If using donor T cells, they might be genetically modified
using TALENs to disrupt T-cell receptor expression (to prevent immune
rejection/Graft vs Host reaction) and knock out genes like CD52 (to make
them resistant to certain drugs).

H. Stem Cells

Stem cells are undifferentiated cells with the ability to self-renew and differentiate into various
specialized cell types.

 Uses: Regenerative medicine (e.g., spinal injury, macular degeneration), disease


modeling (e.g., iPSC models of FTD3), and drug screening/toxicology.
 Types:
1. Totipotent: Can form any cell type, including embryonic and extraembryonic
tissues (e.g., a zygote).
2. Pluripotent: Can form any body cell type but not extraembryonic tissues (e.g.,
Embryonic Stem Cells - ESCs, Induced Pluripotent Stem Cells - iPSCs).
3. Multipotent: Limited to a specific range of cell types within a particular tissue
(e.g., adult stem cells in bone marrow).
 How to Make:
1. Embryonic Stem Cells (ESCs): Derived from the inner cell mass of a blastocyst
(early embryo). They are pluripotent and generally stable.
2. Adult Stem Cells: Found in various tissues (e.g., hematopoietic stem cells in
bone marrow). They are multipotent and can be used to make iPSCs.
3. Induced Pluripotent Stem Cells (iPSCs): Created by reprogramming somatic
cells (e.g., skin fibroblasts) into a pluripotent state. This is typically done by
introducing specific transcription factors (Oct4, Sox2, Klf4, c-Myc) using
techniques like lentiviral transduction or episomal vectors.
 Pros: Patient-specific (so no immune rejection), avoids ethical issues
associated with embryos.
 Cons: Tumor risk, low reprogramming efficiency, may retain "epigenetic
memory" of their original cell type, and can be prone to new mutations.

I. Risk Assessment for GMOs (Genetically Modified Organisms) and Gene Therapies

 Definition: Environmental risk assessment (ERA) and human health risk assessment are
central to regulating GMOs.
 Regulation: GMO regulations vary worldwide, with the EU having stricter rules. GMOs
are typically regulated under directives covering deliberate release into the environment
and contained use. Genome editing (e.g., CRISPR) is currently often regulated as GMOs,
but this is evolving.
 ERA Purpose: To identify, characterize, and manage potential risks before GMOs are
used in clinical trials, agriculture, or industry.
 Considerations in GMO Medicines (Gene Therapies, Cell Therapies, Vaccines):
o Require extensive preclinical trials in model organisms.
o Risks:
1. Shedding: The release of genetically modified material into the
environment (e.g., from dead embryos) that could infect unintended
individuals.
2. Replication competence: The ability of a viral vector (used to deliver
genes) to replicate uncontrollably within the patient or spread to the
environment.
3. Insertional mutagenesis: The inserted gene landing in the wrong place in
the genome, potentially causing cancer.
4. Genetic stability of the construct: Whether the inserted transgene
remains intact and functional over time in host cells. If unstable, it could
become ineffective or harmful.
5. Recombination of the construct: The inserted genetic material
recombining with host DNA or other viral sequences in unintended ways,
potentially leading to new, harmful viruses or hybrid genes.
6. Molecular characteristics of the construct: Poor design can lead to
overexpression, silencing of host genes, or other unintended effects.
 Case Examples:
o Transgenic CHO cells: Used for pharmaceutical production, require inactivation
and DNA degradation before disposal to prevent environmental release.
o CRISPR-edited chickens: Modified so male embryos die early, while females
survive and are non-transgenic. Regulations must assess risks like horizontal gene
transfer (gene moving to another organism) or incomplete lethality of males.
 Key Distinction:
o Risk assessment: Science-based process to evaluate hazards and their likelihood.
o Risk management: Policy-driven, uses scientific input for regulatory decisions.
o Risk communication: Interactive exchange of information about risks.

I. Organisms that Can Be Genetically Modified

 Bacteria: E. coli, Bacillus subtilis (for producing insulin, enzymes, vitamins).


 Plants: Maize, soybean, tomato, potato (for pest resistance, herbicide tolerance,
improved nutrition).
 Animals: Mice, zebrafish, dogs, pigs, chickens (for disease models, drug testing, organ
donor models).
 Fish: Salmon (e.g., AquaAdvantage salmon for faster growth).
 Human cells: CAR T cells, gene therapies (e.g., for sickle cell disease, cancer
immunotherapy).
 Purpose: Research, therapeutics, agriculture, and industrial biotechnology.

This comprehensive overview should provide you with a solid foundation as you begin to
explore the vast and exciting field of genetics and molecular biology!

You might also like