CHAPTER TWO
Repetitive DNA
BY: Mekoyet Addise(MSc.)
1
2.Introduction to repetitive DNA
2.1. Repetitive DNA: An Important source of
variation in Eukaryotic Genomes
Types and distribution of repetitive DNA
Function of repetitive DNA sequence
2.2. Genetic diversity and basis of polymorphism
2
2.1. Repetitive DNA: An Important source of
variation in Eukaryotic Genomes
What is a Genome ?
From where does the
variation come from?
What is repetitive
sequence (repetitive
DNA ) mean?
3
Genome
A genome is an organism’s complete set of genetic instructions. Each
genome contains all of the information needed to build that organism
and allow it to grow and develop.
The instructions in our genome are made up of DNA.
All living things have a unique genome.
The human genome is made of 3.2 billion bases of DNA but other
organisms have different genome sizes.
4
From where does the variation come from?
Do more complex organisms have more genes?
Is variation in genome size between organisms
determined by coding or non-coding sequences?
Variation in genome size between species is not
explained to any significant extent by the number of
genes or by their size i.e. C Value Paradox!
5
The concept of C value paradox….
The E. Coli genome has 4.6 million base pairs and codes for
about 3,000 different proteins (proteins of ~40,000 and 500
bp for promoters)
Using the same assumptions the human genome should code
for 1 million proteins (3 billion base pairs (3*10^9),
protein ~50,000 and promoters of 1500 bp)
Humans only have ~30,000 coding “genes”
6
Which one has more genes?
7
Percent of non-coding DNA
8
Cont’d..
The genome…
The majority of a given eukaryotic nuclear genome is repetitive DNA
that include:
Transposons,
simple séquence repeats,
Segmental duplications and pseudogenes
9
What does the DNA in the human genome look like?
10
What is repetitive DNA mean?
Repeated sequences (also known as repetitive elements, repeats,
repetitive DNA) are short or long patterns of nucleic acids (DNA or
RNA) that are present in multiple copies throughout the genome.
Is a major component of eukaryotic genomes and may account for up
to 90% of the genome size.
11
Repetitive DNA
Repetitive DNA are present in all three genomes
Nuclear DNA
MtDNA
CpDNA
12
CpDNA
Only present in plants
Ranges from 135 to 160 kb in size
Packed with genes
Resembles the streamlined configuration of its
cyanobacterial ancestral genome
consists of an inverted repeat separating one large single
copy (LSC) and one small single copy (SSC) region
13
Nuclear genome
A huge ocean of largely nongenic DNA with:-
Some tens of thousands of genes and gene clusters
scattered around like small islands and
archipelagos
A high proportion of this apparently nonfunctional
DNA consists of repeated motifs
May be considered as junk DNA or selfish DNA
14
MtDNA
Shares a number of features with both the nuclear and the chloroplast
genome
• Thus, plant mtDNA genes have prokaryotic properties just like CpDNA
genes. But, introns are more common
• With about 370 to 490 kb, the three higher plant mtDNAs sequenced so
far are about 20 times larger than their animal counterparts. But only
about 10% of these sequences represent genes
• Another 10 to 26% were found to be made up of repetitive DNA,
including retrotransposons
• Thus, the majority of plant mtDNA sequences lack any obvious features
of information
15
Con’t….
Both cpDNA and mtDNA are present in hundreds of copies
per cell
Each acts as a single heritable unit
Inheritance is uniparental. In most cases, transmission is
through the female parent
The best-known exception to this rule is the paternal
transmission of cpDNA in most but not all gymnosperms
• The accumulating sequence data also revealed an extensive
and ongoing horizontal exchange of DNA between the three
different genomes
16
Repeated DNA elements
Comprise the largest space of the nuclear genome in most
eukaryotic organisms
Various types of repetitive DNA are also found in the
organelles
Therefore, a considerable fraction of the currently employed
DNA profiling relies on mutations of repetitive DNA elements
17
Types and distribution of repetitive DNA sequences
Depending on their genomic organization, repetitive DNA elements may be
classified as
1. Tandemly repeated
are restricted to fewer loci
consist of arrays of two to several thousand sequence units arranged in a
head-to-tail fashion
This kind of organization is also exhibited by some genes, such as the
transcription units for histone mRNA and rRNA
2. Interspersed repeats:- are identical or similar DNA sequences which are
found in different locations throughout the genome
exemplified by transposable elements
are present at multiple sites throughout the genome
18
1. Tandem-repetitive DNA
Classified according to the length and copy number of the
basic repeat units as well as its genomic localization
I. Satellite DNA
II. Minisatellites
III. Microsatellites
19
I .Satellite DNA(satDNA)
Generally heterochromatic in nature
Often located in subtelomeric or centromeric regions
Typical satellites consist of very high numbers of repetitions
usually between 1000 and more than 100,000 copies, of basic sequence
motif
Monomer sizes may range from two to several thousand bp, but 100 to
300 bp are most common
Satellite DNA as fraction of total genome
Mammals 5-30%
Plants 5-40%
20
II. Minisatellites
Coined by Jeffreys et al.
Occur in nuclear DNA
Highly polymorphic loci
Often, they form families of related sequences that occur at many hundred loci
in the nuclear genome
Consist of intermediate-sized DNA motifs (about 10 to 60 nucleotides)
Show a lower degree of repetition at a given locus compared with satellites
Units carry a common GC-rich core sequence of 10 to 15 bp
Repeats with longer unit size has higher AT content were also identified
21
Minisatellites
Distributed unevenly across the nuclear genome
Localization of human minisatellites
in subtelomeric regions
significant increase toward the telomeres
In other mammals, a subtelomeric location of minisatellites is less
obvious
cluster around the centromeres ( in plants)
are frequently associated with other types of repeats, including
Microsatellites
Transposons
22
Functions of minisatellites
Nuclear proteins specifically i.e. interaction with certain minisatellites
Serve as regulatory purposes, for example,
Recombination
Transcriptional activation and/or
Splicing etc.
May constitute fragile chromosome sites
Could thus be involved in chromosomal translocations
They are sometimes present in genes
as, for example, in human genes encoding an epithelial mucin
and involucrin
23
Minisatellites as molecular markers
Exploited as molecular markers in various ways
But two techniques clearly prevail
Minisatellite-complementary probes are hybridized to restriction-
digested genomic DNA to produce highly variable RFLP fingerprints
Minisatellites are used as single primers in a PCR
Minisatellites in plant mtDNA and cpDNA represent a largely untapped source
of molecular markers at the intraspecific level
24
III. Microsatellites
First recognized in the early 1970s
When (TAGG)n repeats were found in the satellite DNA of a hermit
crab
Consist of tandemly reiterated, short DNA sequence motifs ( 1 to 6 bp)
They are ubiquitous components of all eukaryotic genomes, and are also
found in prokaryotes
Microsatellite frequencies in plants are higher than animals
Usually characterized by a low degree of repetition at a particular locus
Microsatellites consisting of identical motifs may be found at many
thousand genomic loci
25
Categories of microsatellites
Classification is based on
Motif
Degree of perfectness of the arrays
26
Categories of microsatellites; based on motif
Monomeric, one nucleotide repeat , (A)n
Dimeric, two nucleotide repeat, e.g.(CA)n
Trimeric, three nucleotide repeat, e.g. (GAA)n
Tetrameric, Pentameric, Haxameric, …
The most abundant motifs found in mammalian genomes
(A)n and (CA)n as well as their complements
The most frequent motifs in plants
(A)n, (AT)n, (GA)n, and (GAA)n repeats
Mononucleotide repeats consisting of A/T tracts are also present
in chloroplast genomes
27
Categories of microsatellites; based on motif
tri-, tetra, and pentanucleotide motifs are generally less common than mono- and
dinucleotide repeats
Estimates are extremely variable depending on:
• The motif
• The genomic localization (introns vs. exons vs. 5’- and 3’- untranslated
regions vs. intergenic regions), and
• The species under consideration
As a general rule, trinucleotide repeats are the predominant type of
microsatellites found in exons b/c
Since slippage of one or more trinucleotide units does not affect the
triplet periodically imposed by the open reading frame
Repeats consisting of multiples of one, two, four, and five bps are rare
in genes
Frameshift mutations resulting from the insertion/deletion of the other
types of repeat units will completely change the amino acid sequence
downstream of the mutated site28
Categories of microsatellites;
based on degree of perfectness of the arrays
Weber (1990) recognized three classes, comprising
1. Perfect repeats, which consist of a single, uninterrupted array of
particular motif e.g. GCTAGCCACACACACACACATGCATC
2. Imperfect repeats, in which the array is interrupted by one or
several out-of-frame bases, e.g.
GCTAGCCACACGTCACACACTGCATC
– Compound repeats, with intermingled perfect or imperfect arrays
of several motifs. e.g. GCTAGCCACACATATATGTGTGCATC
Weber also showed that the level of polymorphism exhibited by
PCR-amplified (CA)n microsatellites in humans is positively
correlated with the number of uninterrupted, perfect repeats at a given
locus
29
Microsatellites in organelle genomes
•
• Poly (A/T) repeats are the only type of microsatellites that are
regularly present in the chloroplast genome,
• Mainly in introns and intergenic regions
• Some chloroplast microsatellites appear to be associated with
mutational hotspots in the cpDNA molecule
• They appear to be rare in plant mtDNA, with one single
explicit report of a (G)n repeat from several conifer species
30
Kangaroo Rat (Dipodomys ordii)
50% of the genome consists of:
AAG 2.4 109 times
TTAGGG 2.2 109 times
ACACAGCGGG 1.2 109 times
31
Potential functions of microsatellites
1. Microsatellite-like repeats are structural elements of both
telomeres and centromeres
2. Some microsatellites bind nuclear proteins and may, for
example, serve as a landing pad for transcription factors
that enhance or reduce the expression of neighboring genes
(e.g., the GAGA factor)
3. Some microsatellites (especially trinucleotide repeats) are
transcribed and then often encode tracts of identical amino
acids
32
Microsatellites as molecular markers
• The most important variant is the locus-specific PCR amplification
of nuclear and organellar microsatellites with flanking primers
• Other methods use microsatellite motifs (instead of flanking regions)
as single PCR primers,
•
• As PCR primers in combination with other primer types, or
• As hybridization probes
33
Microsatellites vs minisatellites
Microsatellites are more useful than minisatellites for marker analysis
because:
- They are shorter
- Easier to amplify
- More abundant, and
- More evenly distributed throughout the genome
The large number of alleles and high levels of variability among closely
related organisms made PCR-amplified microsatellites the marker system
of choice for a wide variety of applications
34
2. Transposable elements
First discovered by Barbara McClintock in maize more than 50
years ago
35
Transposable elements
First discovered by Barbara McClintock in maize more than 50 years ago
Mobile genetic elements
able to change their position within the genome
acquired their current genomic location by transposition
The mechanism of transpostion can be divided into two classes
Class I transposons
o disperse via an RNA intermediate
o Given that reverse transcription of RNA into DNA
o reverse-transcribed into a cDNA
o more commonly called retrotransposons
Class II transposons
o Propagate (jump) via DNA intermediate
36
(eukaryotes only)
37
“copy-and-paste”
38
“cut-and-paste”
39
( prokaryotes
40 and eukaryotes)
Retrotransposons
According to their genomic organization and gene content, retrotransposons may
be further divided into:
1. Retroviruses
2. Long terminal repeat (LTR) retrotransposons
3. Long interspersed elements (LINEs)
4. Short interspersed elements (SINEs)
LINEs and SINEs are also referred to as non-LTR retrotransposons
For each type of retrotransposons, active as well as defective copies have been
found
In general, inactive elements outnumber active copies by a factor of several
thousand
Mobile elements in eukaryotes are dominated as retrotransposons than
eukaryotic DNA transposons.
41
Retroviruses
Are distinguished from other types of retroelements by the presence
of an env gene in their genome
The protein encoded by this gene allows retroviruses to enter and
leave their host cell
the only infectious type of retroelement
typical host organisms are the vertebrates
It was hypothesized that retroviruses evolved from LTR
retrotransposons
characterized by the presence of about 300 to 500-bp-long direct
repeats at both ends of the element
42
Retroviruses cont..
It encodes
1. A capsid protein, which packages the viral RNA into a virus-like
particle
2. An Rnase (RNase H)
3. A reverse transcriptase, which generates a cDNA from the full-sized
message
4. A protease, which is needed for processing the polyprotein, and
5. An endonuclease, which serves as an integrase
43
LTR Retrotransposons
RNA intermediate
transcribed from the
mobile element by
RNA polymerase
Reverse transcription
to convert the RNA
into double stranded
DNA by reverse
transcriptase
Like Retroviruses 44
Non-LTR retrotransposons
A. Long Interspersed Elements (LINEs) :
An interesting and heterogeneous class of sequences comprised in part of
transposons and retrotransposons.
Elements that are 3,000 - 5,000 bp in length that are dispersed (interspersed)
throughout genomes
Clearly mobile (able to “move” from location to location within a genome)
and inducible.
Definite involvement of transposable elements in mutation and chromosomal
rearrangement. Example:- ≈6 kb in human
account for 21%
45 of the genome
General Principles of LINE transposition
46
Lodish et al., Molecular Cell Biology, 7th ed. Fig 10-16
B) Short Interspersed Elements (SINEs)
150-300 base pair (bp) repeated elements are found – typically possess an 8-20
bp inverted repeat (characteristic of “insertion” sequences) called ‘target-site
duplications’
exhibit a highly variable pattern among organisms. e.g. ≈300 bp in human,
account for 13% of the genome
SINE sequences are transcribed but are not translated -- in humans, AluI
sequences are found in 20% of hn (pre-)mRNA but are removed during mRNA
processing
now thought to be possibly ‘mobilized’ by retroposons (LINES)
the function of SINE sequences are unknown; ad hoc suggestions include
transcription regulation and regulation of mRNA processing
47
Class II transposons
Disperse via a DNA intermediate
Characterized by short terminal inverted repeats (TIRs)
The internal regions encode one or two genes responsible for transposition
Transposition usually follows a nonreplicative cut-and-paste mechanism
copy numbers are small to intermediate (usually less a few hundred)
comprise only a small part of the genome
They often integrate in gene-rich regions
makes them useful tools for gen isolation by transposon tagging
Most mobile elements in bacteria is DNA
48
transposons
49
Clasification of Class II transposons
In plants, can be grouped into at least four superfamilies,
Three of which (Ac, CACTA, Mu) were first characterized in
maize
Transposons of the Ac family (e.g., Ac in maize, P-elements in
Drosophila) code for a single gene ( a transposase)
Transposons of the CACTA family carry two genes, encoding
a transposase and a DNA binding protein
50
Unclassified transposons
Miniature inverted-repeat transposable elements (MITEs) are a superfamily
of transposons that are characterized by
• Small size (<500 bp)
• Short TIRs
• AT-richness
• High levels of internal sequence divergence
• The potential to form secondary structures
• Relatively large copy numbers (typically > 1000 per haploid genome),
and
• A preference for sequence-defined integration sites such as TA or TAA
• More than 10 MITE families have been characterized
• There is no sequence homology between the various families
51
Transposons as molecular markers
• A wide variety of molecular marker techniques use PCR
primers directed toward transposable elements, either alone or
in combination with other types of primers
• Thus, LTR retrotransposon-specific primers have been
combined with microsatellite-specific primers
• With AFLP primers in sequence-specific amplifications
polymorphism (S-SAP)
• AFLP primers were also used together with primers specific
for DNA transposons
52
Eukaryotic Repetitive DNA
Tandem repeats Interspersed repeats
DNA
RNA transposons
transposons
Minisatelites Satellite microsatellite
LTRs Non- LTRs
LINEs SINEs
53
2.2. Genetic diversity and
basis of polymorphism
2.2. Genetic diversity and basis of polymorphism
What is genetic diversity ?
How is genetic diversity generated?
Why is genetic diversity important?
What happen when genetic diversity is low?
How do we stop genetic diversity loss?
What is polymorphism in genetic diversity?
55
Genetic diversity
What is genetic diversity?
Genetic- means related to traits passed from parent to
offspring
Diversity- means having a range of different things
Genetic diversity:- refers to the range of different
inherited traits within a species.
The combined differences in the DNA of all individuals in
a species make up the genetic diversity
56
Cont’d..
The overall diversity in the DNA between the individuals of a
species of that species.
It causes individuals to have different characteristics
In a species with high genetic diversity, there would be many
individuals with a wide variety of different traits.
E.g. Although all apples belong to the same species,
the apples we eat are hundreds of apple varieties, that range from
red to green,
tart to sweet, and
some apples even have pink
57
flesh inside
How is Genetic Diversity Generated?
Compared to coding sequences, repetitive DNAs are considered as
fast-evolving genome components.
Their variable abundance, high sequence variations and distinct
chromosomal distributions contribute to genome divergence
among species.
Mutation of genes, genetic drift and gene flow are also responsible for genetic
diversity.
Mutations can arise when mistakes are made while cells are copying DNA.
These mutations make up a species’ genetic diversity.
Most mutations are either harmful or have no impact at all, but sometimes
these mutations can cause changes that are helpful for a species
58
Why is Genetic Diversity Important?
The individuals that have these helpful mutations
might have greater chances of survival, and have
more babies as a result. This is adaptation
Adaptation :- the process of a species changing in order to
better survive in its environment
In addition, it strengthens the ability of species and
populations to resist diseases, pests and other stress.
.
59
What Happens When Genetic Diversity is Low?
When few mutations are found in the DNA of a species,
genetic diversity is said to be low.
Low genetic diversity: means that there is a limited variety
of alleles for genes within that species and so there are not
many differences between individuals.
means that there are fewer opportunities to adapt
to environmental changes.
often occurs due to habitat loss.
60
Cont’d…
If genetic diversity gets too low,
species can go extinct and be lost forever due to the combined
effects of inbreeding depression and failure to adapt to
change.
In such case , the introduction of new alleles can save a population . This is
called genetic rescue.
Genetic rescue : is A conservation strategy, new individuals are moved
into a population to increase genetic diversity and improve population
health. 61
How Do We Stop Genetic Diversity Loss?
The following are strategies that can help to stop genetic diversity loss
:
preserve and protect genetic diversity
use nature reserves and wildlife bridges to reconnect wild
populations that have become separated by our cities and
highways.
restore habitats, because this will allow wild populations to
get bigger.
Sometimes remove harmful stressors and pests
62
Cont’d…
reintroduce species that have been lost from habitats
they used to live in.
It is important to protect genetic diversity because it is the
foundation for healthy species.
Healthy species are necessary for human health and for the
health of the whole planet
63
what is polymorphism?
Polymorphism
Presence of two or more variant forms of a
specific DNA sequence that can occur among
different individuals or populations.
Mutation are the basis of polymorphism
So , genetic polymorphism determines the diversity
of individuals.
64
65