NGS
Next generation sequencing (NGS), massively parallel or
deep sequencing are related terms that describe a
DNA sequencing technology which has revolutionised genomic
research. ... In contrast, the previous Sanger sequencing technology,
used to decipher the human genome, required over a decade to deliver
the final draft.
What are three next generation sequencing techniques?
Such technologies include:
Illumina (Solexa) sequencing. Illumina sequencing works by
simultaneously identifying DNA bases, as each base emits a
unique fluorescent signal, and adding them to a nucleic acid
chain.
Roche 454 sequencing. ...
Ion Torrent: Proton / PGM sequencing
Next Generation Sequencing
(NGS)
An Introduction
Deoxyribonucleic acid, commonly known as DNA, contains the
blueprints of life. Within its structures are the codes required for the
assembly of proteins and non-coding RNA – these molecular
machineries affect all the biological systems that create and maintain
life. By understanding the sequence of DNA, researchers have been able
to elucidate the structure and function of proteins as well as RNA and
have gained an understanding of the underlying causes of disease. Next
Generation Sequencing (NGS) is a powerful platform that has enabled
the sequencing of thousands to millions of DNA molecules
simultaneously. This powerful tool is revolutionizing fields such as
personalized medicine, genetic diseases, and clinical diagnostics by
offering a high throughput option with the capability to sequence
multiple individuals at the same time.
o Sanger Sequencing
o Similarities between different NGS Technologies
o Differences between different NGS Technologies
o Comparisons between Different NGS Platforms
o Useful Terms
o References
Sanger Sequencing
Sanger Sequencing utilizes a high fidelity DNA-dependent polymerase
to generate a complimentary copy to a single stranded DNA template
(1) (2) (3). In each reaction a single primer, complementary to the
template, initiates a DNA synthesis reaction from its 3’ end.
Deoxynucleotides or nucleotides, which are the monomers of DNA, are
added one after the other in a template-dependent manner forming
phospho-diester bonds between the 3’ hydroxyl of the growing end of
the primer and the 5’ tri-phosphate group of the incoming nucleotide
(Figure 1)(1).
Each reaction also contains a mixture of four di-deoxynucleotides, one
for each DNA base (i.e. A, G, T, and C). These di-deoxynucleotides
resemble the DNA monomers enough to allow incorporation into the
growing strand, however, they differ from natural deoxynucleotides in
two ways: 1) they lack a 3’ hydroxyl group which is required for further
DNA extension resulting in chain termination once incorporated in the
DNA molecule, and 2) each di-deoxynucleotide has a unique fluorescent
dye attached to it allowing for automatic detection of the DNA sequence
(3) (4) (5).
Many copies of different-length DNA fragments are generated in each
reaction, terminated at all of the nucleotide positions of the template
molecule by one of the di-deoxynucleotides (Figure 1). The reaction
mixtures are loaded on the sequencing machine, either manually onto
slab gels or automatically with capillaries, and are electrophoresed to
separate the DNA molecules by size. The DNA sequence is read through
the fluorescent emission of the di-deoxynucleotide as it flows through
the gel (Figure 2) (5). Modern day Sanger Sequencing instruments use
capillary based automated electrophoresis, which typically analyzes 8–
96 sequencing reactions simultaneously.
Figure 1 – An illustration of Sanger Sequencing.
Require a high fidelity DNA polymerase for your Sanger
Sequencing? Try our Precision DNA Polymerase. We also
offer SpeedySeq DNA Sequencing Service as a rapid alternative.
Similarities between different NGS Technologies (6)(7)
(8)(9)
Next Generation Sequencing systems have been introduced in the past
decade that allow for massively parallel sequencing reactions. These
systems are capable of analyzing millions or even billions of sequencing
reactions at the same time. Although different machines have been
developed with various differing technical details, they all share some
common features which are outlined below (Figure 2):
1. Sample Preparation:
All Next Generation Sequencing platforms require a library obtained
either by amplification or ligation with custom adapter sequences. These
adapter sequences allow for library hybridization to the sequencing chips
and provide a universal priming site for sequencing primers. learn more
about sample preparation from our Next Generation Sequencing -
Experimental Design knowledge base.
2. Sequencing machines:
Each library fragment is amplified on a solid surface (either beads or a
flat silicon derived surface) with covalently attached DNA linkers that
hybridize the library adapters. This amplification creates clusters of
DNA, each originating from a single library fragment; each cluster will
act as an individual sequencing reaction.
The sequence of each cluster is optically read (either through the
generation of light or fluorescent signal) from repeated cycles of
nucleotide incorporation. Each machine has its own unique cycling
condition; for example, the Illumina system uses repeated cycles of
incorporation of reversibly fluorescent and terminated nucleotides
followed by signal acquisition and removal of the fluorescent and
terminator groups.
abm offers a wide range of sequencing services on the advanced
Illumina® sequencing platforms(MiSeq and NextSeq500), to see a
complete list of service offerings, click here.
3. Data output:
Each machine provides the raw data at the end of the sequencing run.
This raw data is a collection of DNA sequences that were generated at
each cluster. This data could be further analysed to provide more
meaningful results.
abm delivers sequencing results in industry standard FASTQ format.
For other bioinformatics analyses (i.e. BWA, GATK, Picard), please
contact us at [email protected] for a quote.
Figure 2 – An illustration of the similarities and difference between the
different Next Generation Sequencing platforms.
Differences between different NGS Technologies (6)(7)
(8)(9)
The differences between the different Next Generation Sequencing
platforms lie mainly in the technical details of the sequencing reaction.
Below we describe these technical differences briefly. For a full
explanation, please visit the manufacturers’ webpages at the links
provided in each section.
Pyrosequencing
In pyrosequencing, the sequencing reaction is monitored through the
release of the pyrophosphate during nucleotide incorporation. A single
nucleotide is added to the sequencing chip which will lead to its
incorporation in a template dependent manner. This incorporation will
result in the release of pyrophosphate which is used in a series of
chemical reactions resulting in the generation of light. Light emission is
detected by a camera which records the appropriate sequence of the
cluster. Any unincorporated bases are degraded by apyrase before the
addition of the next nucleotide. This cycle continues until the sequencing
reaction is complete (Table 1).
Disadvantages:
High reagent cost, and high error rate over strings of 6 or more single
base nucleotides.
Table 1 — Technical details for all available pyrosequencing based NGS machines.
GS Junior GS Junior GS FLX+ System
Plus
GS FLX GS FLX
Titanium Titanium
XL+ XLR70
700bp 450bp
Read
400bp 700bp (up to (up to
Length
1,000bp) 600bp)
Throughput 35Mb 70Mb 700Mb 450Mb
100,000 100,000 1,000,000
Reads per Shotgun, shotgun, 1,000,000 shotgun,
Run 70,000 70,000 shotgun 700,000
amplicon amplicon amplicon
99% at 99% at
Accurarcy 99.997% 99.995%
400bp 700bp
Run Time 10 hr 18 hr 23 hr 10 hr
For more information, please visit the Roche/454 Life
Science website.
Sequencing by Synthesis
Sequencing by synthesis utilizes the step-by-step incorporation of
reversibly fluorescent and terminated nucleotides for DNA sequencing
and is used by the Illumina NGS platforms. The nucleotides used in this
method have been modified in two ways: 1) each nucleotide is reversibly
attached to a single fluorescent molecule with unique emission
wavelengths, and 2) each nucleotide is also reversibly terminated
ensuring that only a single nucleotide will be incorporated per cycle. All
four nucleotides are added to the sequencing chip and after nucleotide
incorporation the remaining DNA bases are washed away. The
fluorescent signal is read at each cluster and recorded; both the
fluorescent molecule and the terminator group are then cleaved and
washed away. This process is repeated until the sequencing reaction is
complete. This system is able to overcome the disadvantages of the
pyrosequencing system by only incorporating a single nucleotide at a
time (Table 2).
Disadvantages:
As the sequencing reaction proceeds, the error rate of the machine also
increases. This is due to incomplete removal of the fluorescent signal
which leads to higher background noise levels.
Table 2 — Technical details for all available sequencing by synthesis based NGS machines.
HiSeq HiSeq
MiSeq NextSeq 500 HiSeq 2500
3000 4000
Mid- High- Rapid High-
Run Mode N/A N/A N/A
Output Output Run Output
Flow Cells
1 1 1 1 or 2 1 or 2 1 1 or 2
Per Run
50- 125- 125-
Output 0.3-15 20-39 30-120 10-300
1000 750 1500
Range Gb Gb Gb Gb
Gb Gb Gb
Run Time 5-55 hrs 15-26 12-30 7-60 <1-6 <1-3.5 <1-3.5
hrs hrs hrs days days days
Reads per 130 400 300 2 2.5 2.5
25million
Flow Cell million million million billion billion billion
Maximum
2x 2x 2x 2x 2x 2x
Read 2 x 300bp
150bp 150bp 250bp 125bp 150bp 150bp
Length
For more information, please visit the Illumina website.
abm offers a wide range of sequencing services on the advanced
Illumina® sequencing platforms (MiSeq and NextSeq500), to see a
complete list of service offerings, click here.
Sequencing by Ligation
Sequencing by ligation is different from the other two methods since it
does not utilize a DNA polymerase to incorporate nucleotides. Instead, it
relies on short oligonucleotide probes that are ligated to one another.
These oligonucleotides consist of 8 bases (from 3’-5’): two probe
specific bases (there are a total of 16 8-mer probes which all differ at
these two base positions) and six degenerate bases; one of four
fluorescent dyes are attached at the 5’ end of the probe. The sequencing
reaction commences by binding of the primer to the adapter sequence
and then hybridization of the appropriate probe. This hybridization of
the probe is guided by the two probe specific bases and upon annealing,
is ligated to the primer sequence through a DNA ligase. Unbound
oligonucleotides are washed away, the signal is detected and recorded,
the fluorescent signal is cleaved (the last 3 bases), and then the next
cycle commences. After approximately 7 cycles of ligation the DNA
strand is denatured and another sequencing primer, offset by one base
from the previous primer, is used to repeat these steps - in total 5
sequencing primers are used (Table 3).
Disadvantages:
This method leads to very short sequencing reads.
Table 3 — Technical details for all available sequencing by ligation
based NGS machines:
Genetic Analyzer V2.0
5500W System 5500xl W System
Instrument
Throughput
1 x 50 80 Gb 160 Gb
1 x 75 120 Gb 240 Gb
2 x 50 MP 160 Gb 320 Gb
50 x 50 PE 160 Gb 320 Gb
Accuracy 99.99% 99.99%
Run Time 7 days 7 days
For more information, please visit the Applied Biosystems website.
Ion Semiconductor Sequencing
Ion semiconductor sequencing utilizes the release of hydrogen ions
during the sequencing reaction to detect the sequence of a cluster. Each
cluster is located directly above a semiconductor transistor which is
capable of detecting changes in the pH of the solution. During
nucleotide incorporation, a single H+ is released into the solution and it
is detected by the semiconductor. The sequencing reaction itself
proceeds similarly to pyrosequencing but at a fraction of the cost (Table
4).
Disadvantages:
High error rate over homopolymeric stretches of nucleotides.
Table 4 — Technical details for all available ion semiconductor sequencing based NGS
machines:
Ion Proton System
Output up to 10 Gb
Reads 60-80 million Reads
Read Length up to 200bp
Run time 2-4 hrs
For more information, please visit the Life Technologies website.
Comparisons between Different NGS Platforms
It is difficult to see the differences between the different NGS
instruments based on the above data. In this section we attempt to
simplify comparisons between instruments by seeing how each system
performs if given the task to sequence either the human (3,300,000,000
bases), mouse (2,800,000,000 bases), Arabidopsis thaliana (135,000,000
bases), and E. coli (4,639,221 bases) genomes (Table 5). To be able to
use the sequencing data, coverage of at least 30x is required, anything
lower than this number is marked in red and anything higher is marked
in green.
Table 5 — Coverage of genome per run
GS FLX+ System
GS
GS
Roche Junior GS FLX
Junior GS FLX
Plus Titanium
Titanium XL+
XLR70
Human 0 0 0 0
Mouse 0 0 0 0
Arabidopsis
0 1 5 3
thaliana
E-Coli 8 15 151 97
HiSeq HiSeq
Illumina MiSeq NextSeq 500 HiSeq 2500
3000 4000
Human 5 12 36 91 303 227 455
Mouse 5 14 43 107 357 268 536
Arabidopsis
111 289 889 2,222 7,407 5,556 11,111
thaliana
E-Coli 3,233 8,407 25,866 64,666 215,553 161,665 323,330
Genetic Analyzer V2.0
Applied Biosystems
5500W System 5500xl W System
Human 48 97
Mouse 57 114
Arabidopsis thaliana 1,185 2,370
E-Coli 34,489 68,977
Ion Proton System
Human 3
Mouse 4
Arabidopsis thaliana 74
E-Coli 2,156
Useful Terms
Next Generation Sequencing is a young field, with the first machines
marketed in 2005. However, in less than a decade NGS has become a
cornerstone of molecular biology and genetics. As such, being familiar
with its technical terms will help in better understanding the available
literature and becoming a member of its ever expanding community. In
this section the most common terms used in this field are explained:
Next Generation Sequencing:
Next Generation Sequencing, or NGS, is a sequencing method where
millions of sequencing reactions are carried out in parallel, increasing
the sequencing throughput.
Reads:
The output of an NGS sequencing reaction. A read is a single
uninterrupted series of nucleotides representing the sequence of the
template.
Read Length: The length of each sequencing read. This variable is
always represented as an average read length since individual reads have
varying lengths.
Coverage:
The number of times a particular nucleotide is sequenced. Due to the
error -prone sequencing reactions, random errors could occur. Therefore,
30x coverage is typically required to ensure each nucleotide sequence is
accurate.
Deep Sequencing:
Sequencing where the coverage is greater than 30x. This is used in cases
where dealing with rare polymorphisms which only a subset of the
sample expresses the mutation. This method increases range,
complexity, sensitivity, and accuracy of the result.
Paired-End Sequencing:
Sequencing from both ends of a fragment while keeping track of the
paired data. With this method the sequencing reaction will commence
from one end of the fragment. Once completed, the fragment is
denatured and a sequencing primer is hybridized to the reverse side
adapter. The fragment is then sequenced again. Using this method will
allow either further confirmation of the accuracy of the sequence or it
could be used to increase the overall read length.
Mate-Paired reads:
A sample preparation step where large DNA fragments (~10kb) are
circularized with an adapter sequence followed by degradation of the
circular DNA. This method links DNA fragments that are separated
from each other by a certain distance and it is used in applications such
as de novo assembly, structural variant detection, and identification of
complex genomic rearrangements.
Adapter:
Unique sequences used to cap the ends of a fragmented DNA. The
adapter’s functions are as follows: 1) allow hybridization to solid
surface; 2) provide priming location for both amplification and
sequencing primers; and 3) provide barcoding for multiplexing different
samples in the same run.
Library:
A collection of DNA fragments with adapters ligated to each end.
Library preparation is required before a sequencing run. Our next
knowledge base will delve into the different sample and library
preparation methods available.
Alignment:
Mapping a sequence read to a known reference genome
Reference sequence/genome:
A fully sequenced and mapped genome used for the mapping of
sequence reads.
De Novo Assembly:
Assembly of the sequence reads to generate a reference sequence.
Specificity:
The percentage of sequences that map to the intended targets out of total
bases per run.
Uniformity:
The variability in sequence coverage across target regions. When
performing whole genome sequencing or exome sequencing, it is
expected that the result will be highly uniform (as there should be a 1:1
ratio in the starting material). However, RNA sequencing will not be
uniform since differences in expression alter its starting material.
Homopolymer:
A stretch of single nucleotide bases, such as AAAA or GGGGGG.