Fundamentals of Bioinformatics
Fundamentals of Bioinformatics
It was in the 17th century that biologists started dealing with problems of information
management. Early biologists were preoccupied with cataloguing and comparing species
of living things. By the middle of the 17th century, John Ray introduced the concept of
distinct species of animals and plants and developed guidelines based on anatomical
features for distinguishing conclusively between species. In 1730 Carlus Linnaeus
established the basis for the modern taxonomic naming system of kingdoms, classes,
genera and species. Taxonomy was the first informatics problem in biology. The
examples of online taxonomy projects. Over a century ago, bioinformatics history started
with an Austrian monk named Gregor Mendel.
He cross-fertilized different colors of the same species of flowers. He kept careful records
of the colors of flowers that he cross-fertilized and the color(s) of flowers they produced.
Mendel illustrated that the inheritance of traits could be more easily explained if it was
controlled by factors passed down from generation to generation.
After this discovery of Mendel, bioinformatics and genetic record keeping have come a
long way. Growth in laboratory technology facilitated collection of data at a rate that was
faster than the rate of data interpretation. Biologists reached a similar information
overload and started facing a lot of difficulties in the field of data analysis and
interpretation. Collecting and cataloguing information about individual genes (approx.30,
000) in human DNA and determining the sequence of three billion chemical bases that
made up the human DNA became the second informatics problem in biology. In 1990,
Human Genome Project was initiated as a prominent bioinformatics solution to the
problem, and this labelled the 21st century as the era of genomes.
Since Mendel, bioinformatics and genetic record keeping have come a long way. The
understanding of genetics has advanced remarkably in the last thirty years. In 1972, Paul
berg made the first recombinant DNA molecule using ligase. In that same year, Stanley
Cohen, Annie Chang and Herbert Boyer produced the first recombinant DNA organism.
1. Joseph Sambrook led a team that refined DNA electrophoresis using agarose gel, and
By 1977, a method for sequencing DNA was discovered and the first genetic engineering
company, Genetech was founded. By 1981, 579 human genes had been mapped and
mapping by in situ hybridization had become a standard method. Marvin Carruthers and
Leory Hood made a huge leap in bioinformatics when they invented a method for
automated DNA sequencing. In 1988, the Human Genome organization (HUGO) was
founded. This is an international organization of scientists involved in Human Genome
Project. In 1989, the first complete genome map was published of the bacteria
Haemophilus influenza. The following year, the Human Genome Project was started. By
1991, a total of 1879 human genes had been mapped. In 1993, Genethon, a human
genome research center in France Produced a physical map of the human genome.
Three years later, Genethon published the final version of the Human Genetic Map. This
concluded the end of the first phase of the Human Genome Project. In the mid-1970s, it
would take a laboratory at least two months to sequence 150 nucleotides. Ten years ago,
the only way to track genes was to scour large, well-documented family trees of relatively
inbred populations, such as the Ashkenzai Jews from Europez. These types of
genealogical searches 11 million nucleotides a day for its corporate clients and company
research. Bioinformatics was fuelled by the need to create huge databases, such as
GenBank and EMBL and DNA Database of Japan to store and compare the DNA
sequence data erupting from the human genome and other genome sequencing projects.
Today, bioinformatics embraces protein structure analysis, gene and protein functional
information, data from patients, pre-clinical and clinical trials, and the metabolic pathways
of numerous species.
Origin of internet:
The management and, more importantly, accessibility of this data is directly attributable
to the development of the Internet, particularly the World Wide Web (WWW). Originally
developed for military purposes in the 60's and expanded by the National Science
Foundation in the 80's, scientific use of the Internet grew dramatically following the
release of the WWW by CERN in 1992.
HTML: The WWW is a graphical interface based on hypertext by which text and graphics
can be displayed and highlighted. Each highlighted element is a pointer to another
document or an element in another document which can reside on any internet host
computer. Page display, hypertext links and other features are coded using a simple,
cross-platform HyperText Markup Language (HTML) and viewed on UNIX workstations,
PCs and Apple Macs as WWW pages using a browser.
Java: The first graphical WWW browser - Mosaic for X and the first molecular biology
WWW server - ExPASy were made available in 1993. In 1995, Sun Microsystems
released Java, an object-oriented, portable programming language based on C++. In
addition to being a standalone programming language in the classic sense, Java provides
a highly interactive, dynamic content to the Internet and offers a uniform operational level
for all types of computers, provided they implement the 'Java Virtual Machine' (JVM).
Thus, programs can be written, transmitted over the internet and executed on any other
type of remote machine running a JVM. Java is also integrated into Netscape and
Microsoft browsers, providing both the common interface and programming capability
which are vital in sorting through and interpreting the gigabytes of bioinformatics data now
available and increasing at an exponential rate.
XML: The new XML standard 8 is a project of the World-Wide Web Consortium (W3C)
which extends the power of the WWW to deliver not only HTML documents but an
unlimited range of document types using customized markup. This will enable the
bioinformatics community to exchange data objects such as sequence alignments,
chemical structures, spectra etc., together with appropriate tools to display them, just as
easily as they exchange HTML documents today. Both Microsoft and Netscape support
this new technology in their latest browsers.
CORBA: Another new technology, called CORBA, provides a way of bringing together
many existing or 'legacy' tools and databases with a common interface that can be used
to drive them and access data. CORBA frameworks for bioinformatics tools and
databases have been developed by, for example, NetGenics and the European
Bioinformatics Institute (EBI). Representatives from industry and the public sector under
the umbrella of the Object Management Group are working on open CORBA-based
standards for biological information representation The Internet offers scientists a
universal platform on which to share and search for data and the tools to ease data
searching, processing, integration and interpretation. The same hardware and software
tools are also used by companies and organisations in more private yet still global Intranet
networks. One such company, Oxford GlycoSciences in the UK, has developed a
bioinformatics system as a key part of its proteomics activity.
ROSETTA: ROSETTA focuses on protein expression data and sets out to identify the
specific proteins which are up- or down-regulated in a particular disease; characterise
these proteins with respect to their primary structure, post-translational modifications and
biological function; evaluate them as drug targets and markers of disease; and develop
novel drug candidates OGS uses a technique called fluorescent IPG-PAGE to separate
and measure different protein types in a biological sample such as a body fluid or purified
cell extract. After separation, each protein is collected and then broken up into many
different fragments using controlled techniques. The mass and sequence of these
fragments is determined with great accuracy using a technique called mass spectrometry.
The sequence of the original protein can then be theoretically reconstructed by fitting
these fragments back together in a kind of jigsaw. This reassembly of the protein
sequence is a task well-suited to signal processing and statistical methods. ROSETTA is
built on an object-relational database system which stores demographic and clinical data
on sample donors and tracks the processing of samples and analytical results. It also
interprets protein sequence data and matches this data with that held in public, client and
proprietary protein and gene databases. ROSETTA comprises a suite of linked HTML
pages which allow data to be entered, modified and searched and allows the user easy
access to other databases. A high level of intelligence is provided through a sophisticated
suite of proprietary search, analytical and computational algorithms. These algorithms
facilitate searching through the gigabytes of data generated by the Company's proteome
projects, matching sequence data, carrying out de novo peptide sequencing and
correlating results with clinical data. These processing tools are mostly written in C, C++
or Java to run on a variety of computer platforms and use the networking protocol of the
internet, TCP/IP, to co-ordinate the activities of a wide range of laboratory instrument
computers, reliably identifying samples and collecting data for analysis. The need to
analyse ever increasing numbers of biological samples using increasingly complex
analytical techniques is insatiable. Searching for signals and trends in noisy data
continues to be a challenging task, requiring great computing power. Fortunately this
power is available with today's computers, but of key importance is the integration of
analytical data, functional data and biostatistics. The protein expression data in
ROSETTA forms only part of an elaborate network of the type of data which can now be
brought to bear in Biology. The need to integrate different information systems into a
collaborative network with a friendly face is bringing together an exciting mixture of talents
in the software world and has brought the new science of bioinformatics to life.
1953 - Watson & Crick proposed the double helix model for DNA based x-ray data
obtained by Franklin & Wilkins.
1954 - Perutz's group develop heavy atom methods to solve the phase problem in
protein crystallography.
1972 - The first recombinant DNA molecule is created by Paul Berg and his group.
1974 - Vint Cerf and Robert Khan develop the concept of connecting networks of
computers into an "internet" and develop the Transmission Control Protocol (TCP).
1975 - Microsoft Corporation is founded by Bill Gates and Paul Allen. Two-
dimensional electrophoresis, where separation of proteins on SDS polyacrylamide
gel is combined with separation according to isoelectric points, is announced by
Patrick H. O. Farrel.
1991 - The research institute in Geneva (CERN) announces the creation of the
protocols which make -up the World Wide Web. The creation and use of expressed
sequence tags (ESTs) is described. Incyte Pharmaceuticals, a genomics company
headquartered in Palo Alto California, is formed. Myriad Genetics, Inc. is founded
in Utah. The company's goal is to lead in the discovery of major common human
disease genes and their related pathways. The company has discovered and
sequenced, with its academic collaborators, the following major genes: BRCA1,
BRACA1, CHD1, MMAC1, MMSC1, MMSC2, CtIP, p16, p19 and MTS2.
1996 - The genome for Saccharomyces cerevisiae (baker's yeast, 12.1 Mb) is
sequenced. The PROSITE database is reported by Bairoch, et.al. Affymetrix
produces the first commerical DNA chips.
1997 - The genome for E.coli (4.7 Mbp) is published. Oxford Molecular Group
acquires the Genetics Computer Group. LION bioscience AG founded as an
integrated genomics company with strong focus on bioinformatics. The company
is built from IP out of the European Molecular Biology Laboratory (EMBL), the
European Bioinformatics Institute (EBI), the German Cancer Research Center
(DKFZ), and the University of Heidelberg. Paradigm Genetics Inc., a company
focused on the application of genomic technologies to enhance worldwide food
and fiber production, is founded in Research Triangle Park, NC. deCode genetics
publishes a paper that described the location of the FET1 gene, which is
responsible for familial essential tremor, on chromosome 13 (Nature Genetics).
1998 - The genomes for Caenorhabditis elegans and baker's yeast are published.
The Swiss Institute of Bioinformatics is established as a non-profit foundation.
Craig Venter forms Celera in Rockville, Maryland. PE Informatics was formed as a
center of Excellence within PE Biosystems. This centre brings together and
leverges the complementary expertise of PE Nelson and Molecular Informatics, to
further complement the genetic instrumentation expertise of Applied Biosystems.
Inpharmatica, a new Genomics and Bioinformatics company, is established by
University College London, the Wolfson Institute for Biomedical Research, five
leading scientists from major British academic centres and Unibio Limited.
GeneFormatics, a company dedicated to the analysis and predication of protein
structure and function, is formed in San Diego. Molecular Simulations Inc. is
acquired by Pharmacopeia.
2000 - The genome for Pseudomonas aeruginosa (6.3 Mbp) is published. The
A.thaliana genome (100 Mb) is sequenced. The D.melanogaster genome (180 Mb)
is sequenced. Pharmacopeia acquires Oxford Molecular Group.
What is Bioinformatics?
and statistics) to understand and organize the information associated with these
molecules, on a large scale. Hence bioinformatics is a management information system
for molecular biology and has many practical applications in various fields.
mathematical, statistical and computing methods that aim to solve biological problems
Bioinformatics has moved just beyond data analysis and includes the following:
DNA Microarrays: There are new technologies designed to measure the relative number
of copies of a genetic message (levels of gene expression) at different stages in
development or disease or in different tissues.
Functional genomics: Large scale ways of identifying gene functions and associations
(yeast two hybrid methods)
Structural Genomics: attempts to crystallize and /or predict the structures of all proteins
As a result of the massive surge in data and its complexity, many of the challenges in
biology have actually become challenges in computing. Such an approach is ideal
because of the ease with which computers can handle large quantities of data and probe
the complex dynamics observed in nature.
OBJECTIVES OF BIOINFORMATICS
Applications of Bioinformatics
Some of the applications related to biological information analysis are :
Information related to biomolecules can be mapped (e.g.), the sequences
can be parsed to find sites where so-
them.
Sequences can be compared, usually by aligning corresponding segments
and looking for matching and mismatching letters in their sequences. Genes
or proteins that are sufficiently similar are likely to be related and are
th
If a homologue exists then a newly discovered protein may be modelled-
that is the 3D structure of the gene product can be predicted without doing
laboratory experiments.
Bioinformatics is used in primer design. Primers are short sequences
needed to make many copies of a piece of DNA as used in PCR.
Bioinformatics is used to attempt to predict the function of actual gene
products
Structural biologists use bioinformatics to handle the vast and complex data
from X-ray crystallography, nuclear magnetic resonance (NMR) and
electron microscopy investigations and create the 3-D models of molecules.
There are other fields for example, medical imaging/image analysis, that
might also be considered as a part of bioinformatics.
FUNDAMENTALS OF BIOINFORMATICS
IMPORTANCE OF BIOINFORMATICS
Scientist and researchers spend their whole life in inventing things for human
benefits. After so many years of development, they have collected huge amount of
valuable data from their experiments and the collection of data is still continuing for
the better understandings of human life. Problem arises when they need to repeat
the research just because the old data is hard to obtain or they do not know whether
it exist or not; this wastes their valuable time. Let us take an example of DNA
identification. Every species or human beings have particular DNA strands that
contain the genetic instructions used in the development and functioning of all known
can find the root of different disease. Earlier it was hard to manage this information.
In order to collect and link DNA information from all over the world and to solve many
medical complications, bioinformatics is a very helpful hand for them. In addition,
scientists also need a tool that can interlink information from different areas like
biology, statistics, genomics etc to make their research faster. For instance, they
may need some data regarding effects of particular gene on human being and its
effect on animal or on other species, so that they can interlink and generate some
beneficial results or antidote that helps in human development. Eventually,
bioinformatics provides that help in interlinking information from different fields and
leads to quick results.
companies is to try to pull these labs away from their paper-based reporting systems
and move them electronic, web-based systems. Adding, storing, retrieving, and
sharing information becomes easy once the data is made electronic.
USE OF BIOINFORMATICS
The use of bioinformatics is three fold. First is to collect and organise the data
in a way that allows researchers to access the information and to entry new results
as it is produced. Just depositing these data in to a database is useless until it get
analysed. So the second aim is to develop tools to analyse this data in required way.
For example, it is of interest to compare a sequenced protein with the already
existing sequence, may need a tool or program for the purpose of comparison.
Developing this kind of resources can only be done with a deep understanding of
computational theory and biology. The third aim is to analyse these data using these
tools and interpret the results in a biologically meaningful manner. In the traditional
biology it is impossible to examine the result of an experiment in an extendable way,
but using bioinformatics we can conduct a global analysis of all the available data.
The experiment results (the data) from biological laboratory will be stored in to the
bioinformatics databases and then the analysis of these data can be done by using
different computational techniques and tools. The data will be analysed in terms of
sequence, structure, function, evolution, pathways, etc. The result of analysis will be
then compared with the biological data and the experiment result.
information or data. So what does this information stands for? Data can be the raw
DNA sequences, protein sequences, macromolecular structures, genome
sequences, etc. DNA sequences are strings with four base letters (A, T, G, and C)
which comprises genes. Next one is protein sequences; these are nothing but a
collection of 20 amino acid letters. The most complex form of information is the
macromolecular structural data. More than half percentage of this data is the
structure of proteins. Genome sequence information is the collection of raw DNA
sequences ranging from 1.6 million bases to 3 billion bases. Now the issue is with
the organisation of this large amount of data. First of all most of this data can be
grouped together based on the biological similarities. For example genes can be
grouped together according to their function, pathway, etc. so the major aspects of
managing this data is developing methods to assess the similarities between these
biomolecules and identifying those that are related. There are different databases
available to deposits this data according to the information it carries. Each database
contains different types of data. Some examples for the databases are protein
sequence database, nucleotide sequence database, structural database, etc. So
according to the types of data these can be stored in different databases. Now the
question is that how can we use or analyse this data in practical life?
The use of bioinformatics is depends on the area or field we use it. Moreover
bioinformatics has its own importance and use in every field of study. Some uses of
bioinformatics is explained here, depends on the branch of study.
STRUCTURAL BIOINFORMATICS
The branch of bioinformatics that deal with the analysis and prediction of the
three-dimensional structure of biological macromolecules such as proteins, RNA,
and DNA is known as Structural Bioinformatics. It deals with generalizations about
macromolecular 3D structure such as comparisons of overall folds and local motifs,
principles of molecular folding, evolution, and binding interactions, and
structure/function relationships, working both from experimentally solved structures
and from computational models. The structural bioinformatics can be seen as a part
of computational structural biology.
Structural bioinformatics has existed in some form or other ever since the
determination of the first myoglobin structure. Structural bioinformatics has a
significant role in the field of molecular biology. The main challenge for structural
bioinformatics is the integration of structural information with other biological
information to have a deep understanding of biological function. The success of
genome-sequencing projects has created information about all the structures that
are present in individual organisms, as well as both the shared and unique features
of these organisms. Homology models for the structures identified through genome
sequencing projects can be created with the help of structural bioinformatics. The
resulting structures will be studied with respect to how they interact and perform their
functions. Similarly, the emergence of microarray expression measurements
provides an ability to consider how the expression of macromolecular structures is
regulated at a structural level including the reactions associated with transcription,
translation, and degradation. The alteration in functional characteristics such as
structure variation, genetic mutation and post translational modifications can be
studied through structure modelling. Moreover 3D structures of the biological
molecules can be analysed through this application. When the organization and
physical structure of entire cells is understood and represented in computational
models that provide insight into how thousands of structures within a cell work
together to create the functions associated with life. In any case, in the field of
biology structural bioinformatics is very important because structure alignments
provide us with more information that is not available from current sequence
alignment methods.
a) PDB:
The World Wide Web site of the Protein Data Bank at the Research Collaboratory for
Structural Bioinformatics (RCSB) offers a number of services for submitting and
retrieving three-dimensional structure data. The home page of the RCSB site
provides links to services for depositing three-dimensional structures, information on
how to obtain the status of structures undergoing processing for submission, ways to
download the PDB database, and links to other relevant sites and software.
b) EMDB:
In 2002, the EMDB was founded at the EBI specifically to archive the non-atomistic
structures (i.e. volume maps, masks and tomograms) determined by different
methods like single-particle methodology, ET and electron crystallography. Today,
the EMDB contains over 1300 released entries and is expected to grow 5 10-fold by
2020. Since 2007, the EMDB has been managed jointly by three partners: PDBe,
RCSB PDB and the National Centre for Macromolecular Imaging (NCMI) at Baylor
College of Medicine.
FUNCTIONAL BIOINFORMATICS:
Functional bioinformatics aims to define gene function by making use of the vast
amount of information now available through high-throughput experimental methods
for mapping and sequencing genomes and approaches for characterising genes'
function, their organisation and expression under different conditions. Together these
functional genomic techniques contribute to the growing field of systems biology.
These techniques are also being increasingly applied to understanding disease in
human populations as part of genetic medicine. A particular disease characteristics
and biomarkers for that disease can be explored through a gene expression
analysis. It is a wide area in bioinformatics and some of the main techniques used in
this field are supervised and unsupervised methods. Unsupervised learners are not
provided with classifications. In fact, the basic task of unsupervised learning is to
develop classification labels automatically. Unsupervised algorithms seek out
similarity between pieces of data in order to determine whether they can be
characterized as forming a group. These groups are termed clusters, and there are a
whole family of clustering machine learning techniques. But in supervised algorithms,
the classes are predetermined. These classes can be conceived of as a finite set,
previously arrived at by a human. In practice, a certain segment of data will be
labelled with these classifications. The machine learner's task is to search for
patterns and construct mathematical models. These models then are evaluated on
the basis of their predictive capacity in relation to measures of variance in the data
itself. Many of the methods referenced in the documentation (decision tree induction,
naive Bayes, etc) are examples of supervised learning techniques. GEO and KEGG
are the commonly used databases for functional data even though there are plenty
of other databases available.
SEQUENCE BIOINFORMATICS
The most important thing in sequencing is the identification of the gene within
the given DNA sequence. Once the sequence has been assembled, bioinformatics
analysis can be used to determine if the sequence is similar to that of a known gene.
This is where sequences from model organisms are helpful. For example, let's say
we have an unknown human DNA sequence that is associated with the disease
Prostate Cancer. A bioinformatics analysis finds a similar sequence from mouse that
is associated with a gene that codes for a membrane protein. It is a good bet that the
human sequence also is part of a gene that codes for the same membrane protein.
In such a way the bioinformatics sequencing can be useful to find an unknown gene
or protein of interest. There are bioinformatics programs like CLUSTALW, BLAST,
etc are available for sequencing. One of the widely used databases for sequence
data is UniProt.
a) UniProt:
32. Shotgun cloning differs from the clone-by-clone method in which of the following
ways?
A. The location of the clone being sequenced is known relative to other clones within the
genomic library in shotgun cloning.
B. Genetic markers are used to identify clones in shotgun cloning.
C. Computer software assembles the clones in the clone-by-clone method.
D. The entire genome is sequenced in the clone-by-clone method, but not in shotgun
sequencing.
E. No genetic or physical maps of the genome are needed to begin shotgun cloning.
33. CpG islands and codon bias are tools used in eukaryotic genomics to __________.
a. identify open reading frames
b. differentiate between eukaryotic and prokaryotic DNA sequences
c. find regulatory sequences
d. look for DNA-binding domains
e. identify a gene’s function
34. As the complexity of an organism increases, all of the following characteristics
emerge except __________.
a. the gene density decreases
b. the number of introns increases
c. the gene size increases
d. an increase in the number of chromosomes
e. repetitive sequences are present
35. Gene duplication has been found to be one of the major reasons for genome
expansion in eukaryotes. In general, what would be the selective advantage of gene
duplication?
a. If one gene copy is nonfunctional, a backup is available.
b. Larger genomes are more resistant to spontaneous mutations.
c. Duplicated genes will make more of the protein product.
d. Gene duplication will lead to new species evolution.
36. How are so many different antibodies produced from fewer than 300 major genes?
A. gene duplication
B. alternative splicing mechanisms
C. the formation of polyproteins
D. the formation of nonspecific B cells
E. recombination, deletions, and random assortment of DNA segments
37. Two-dimensional gels are used to __________.
A. separate DNA fragments
B. separate RNA fragments
C. separate different proteins
D. observe a protein in two dimensions
E. separate DNA from RNA
38. What would be a likely explanation for the existence of pseudogenes?
a. gene duplication
b. gene duplication and mutation events
c. mutation events
d. unequal crossing over
e. evolutionary pressure
39. If you enter a set of IUPAC codes into BLAST, you are probably trying to
A. find out whether a certain protein has any role in human disease.
B. search for the genes that are located on the same chromosome as a gene whose sequence
you have.
C. find which section of a piece of DNA is transcribed into mRNA.
D. determine the identity of a protein.
40. Your lab partner is using BLAST, and his best E value is 3. This means that
A. he’s found 3 proteins in the database that have the same sequence as his protein.
B. the chance that these similarities arose due to chance is one in 10^3.
C. there would be 3 matches that good in a database of this size by chance alone.
D. the match in amino acid sequencs is perfect, except for the amino acids at 3 positions.
(a) BLAST
(b) RasMol
(c) EMBOSS
(d) PROSPECT
42. In which year did the SWISSPROT protein sequence database begin?
(a) 1988
(b) 1985
(c) 1986
(d) 1987
(a) Dayhoff
(b) Pearson
(d) Michael.J.Dunn
45. Which of the following tools is used for the identification of motifs?
(a) BLAST
(b) COPIA
(c) PROSPECT
46. The first molecular biology server expasy was in the year __________.
(a) 1992
(b) 1993
(c) 1994
(d) 1995
47. What is the deposition of cDNA into the inert structure called?
(a) Genomics
(b) Pharmacogenomics
(c) Pharmacogenetics
(d) Cheminformatics
49. Which of the following compounds has desirable properties to become a drug?
(b) Lead
(c) Fit compound
(b) Biomolecules
51. The process of finding the relative location of genes on a chromosome is called __________.
52. The computational methodology that tries to find the best matching between two molecules, a
receptor and ligand are called __________.
54. The term “invitro” is the Latin word which refers to__________.
55. The stepwise method for solving problems in computer science is called__________.
(a) Flowchart
(b) Algorithm
(c) Procedure
57. The laboratory work using computers and associated with web-based analysis generally online is
referred to as __________.
(a) In silico
58. Which of the following is the first completed and published gene sequence?
(a) X174
(b) T4 phage
59. The laboratory work using computers and computer-generated models generally offline is referred
to as __________.
(a) Insilico
(b) Wet lab
(b) Invitro
(c) In silico
GOLD
PROMISE
SGD
SCOP
none of these
63. Information of all known nucleotide and protein sequences are available on
EMBL
DDBT
All of these
secondary databse
composite database
none of these
65. SCOP is
it is primary database
Joseph Sambrook
Sanger
Altschul et al
67. PDB is
composite database
Phylogenetic analysis
none of these
EMBL
PHD
All of these
none of these
71. Which of these is the most important aspect of planning and designing
72. You need to use a first generation sequencing method for de novo sequencing, which template
should give optimum results for this project?
a) Genomic DNA
b) PCR product
d) Plasmid DNA
73. Which of the following statement are correct as to why the quantity of template used is critical to
a sequencing reaction?
74. What will heterozygous single nucleotide substitution look like on your chromatogram?
c) Two peaks in the same position, one twice the height of the other
75. Which of these projects would be best suited for Next Generation Sequencing?
c) To genotype ten genomic DNA samples for a known single nucleotide polymorphism
76. Which of these is important for preparing templates for Next Generation Sequencing?
77. Once the sequences are obtained from your Next Generation Sequencing experiment what is the
first thing you should do?