0% found this document useful (0 votes)

12 views73 pages

Lecture3 4

The document provides an overview of bioinformatics databases, explaining their structure, types, and management systems. It highlights the importance of primary and secondary databases for storing biological data, with examples like GenBank, EMBL, and DDBJ. Additionally, it discusses the role of the National Center for Biotechnology Information (NCBI) in maintaining these databases and providing tools for data retrieval and analysis.

Uploaded by

sundus waseem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views73 pages

Lecture3 4

Uploaded by

sundus waseem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 73

Introduction to

Bioinformatics

Lecture 3+4
What is a database?
A database is an organized collection of related
information
A Computerized archive used to store and organize
data in such a way that information can be retrieved
easily.
A database is a repository of information that has a
specific structure that enables the entering and
extraction of data
In general, this database structure consists of files or
tables,
each containing numerous records and fields
Database System (DBS) is an integrated
collection of related files along with the detail
about their definition, interpretation,
manipulation and maintenance
A database system controls the data from unauthorized
access.
What are the advantages of using
databases?

 Easy and quick retrieval of information

 Provide backup support

Databases

 Sequence info is stored in databases

 So that they can be manipulated easily
 The db (next slide) are located at diff places
 They exchange info on a daily basis so that they are
up-to-date and are in sync
 Primary db – sequence data
Biological databases
Need to collect and store biological data and its
associated knowledge into databases

Fundamental to the survival of science

Each year, Nucleic Acids Research (NAR) journal

dedicates an entire issue on the available databases!
Database management systems
Database management systems provide several functions in
addition to simple file management:
control security
 maintain data integrity
provide for backup and recovery
 control redundancy
 allow data independence
perform automatic query optimization
Organisation
Organisation:
flat files
Relational databases
Flat-file databases
the simplest form of a database,
where collections of data, such as nucleotide and amino acid
sequence, are stored as either a large single text file
Conti…
Conti..
A relational database stores the data within a number of tables.
Each table consists of records and fields (rows and columns)
Two kinds of biological databases

1. Primary
 Contain primary sequence information (nucleotide or protein)
and associated annotations

2. Secondary
 Summarize the results from primary databases
Types of Database
The databases can be classified
into three categories on the basis
of the information stored.
They are Primary, Secondary and
Composite databases.
Primary databases contain data
that is derived experimentally.
They usually store information
related to the sequences or
structures of biological
components
They can be further divided into
protein or nucleotide databases
Primary databases

 Nucleotide sequence databases

 Protein sequence databases

Conti…
Useful Database
 Secondary
(curated)
 Primary (archival)  RefSeq (seqs)
 GenBank/EMBL/DDBJ  UniProt - SwissProt
(seqs) (seqs)
 PDB -(protein structures)  Taxon (taxonomy)
 PROSITE (binding
 Medline
sites)
(literature)  OMIM (genetics
 IMEx databases literature/reviews)
(protein interactions)  IMEx databases
(protein
interactions)
Major Primary DB
Nucleic Acid Protein
EMBL (Europe) PIR -
Protein Information
Resource
GenBank (USA) MIPS
DDBJ (Japan) SWISS-PROT
University of Geneva,
now with EBI
TrEMBL
A supplement to SWISS-
PROT
NRL-3D
Primary Database
This databases contains the raw nucleic
acid sequence data which are produced
and submitted by researchers worldwide.
NCBI(The National Centre for
Biotechnology Information)
GenBank
DDBJ (DNA data bank of Japan)
Protein
SWISS-PROT(Swiss-Prot )
PIR
PIR (Protein Information Resource) MIPS
PDB(Protein Data Bank) SWISS-PROT
TrEMBL (Translated European Molecular TrEMBL
Biology Laboratory)
NATIONAL CENTER FOR BIOTECHNOLOGY
INFORMATION (NCBI)
developed at the National Institutes of Health (NIH) in 1988
Part of national library of medicine at national institute of health
provides access to a large amount of biomedical and genomic
information (www.ncbi.nlm.nih.gov/home/ about/mission.shtml).
It maintains a large scale of databases and bioinformatics tools
as well as services.
One of the most popular databases is GenBank
Conti…
Mission or role
The aim is to find novel techniques and methodologies for
dealing with huge and complex data
and provide better accessibility to analytical and computational
tools.
Maintenance of biological databases whether primary or
secondary.
It includes GENEBANK
NCBI provides the data retrieval systems such as ENTREZ
Provides computational sources for the analysis of the GENEBANK
data and other biological data
Conti…
Resources
The resources that are present on this site can be divided into
two major categories:
1) databases
2) tools
The major databases maintained at NCBI are
GenBank and PubMed (bibliographic database for biomedical
literature).
Other databases include the
Gene,
Genome,
Epigenomics,
Gene
Expression
RefSeq,
Structure, Database of Short Genetic Variation (dbSNP),
TAXONOMY, etc.
TOOLS at NCBI
The NCBI also provides a variety of tools for database search
The Entrez: is search engine of NCBI
The other tools include
Genomes Browser,
BLAST,
CDTree,
Genetic Codes,
Open Reading Frame Finder (ORF Finder),
SNP Database Specialized Search Tools,
NCBI

Created in 1988 as a part of the

National Library of Medicine at NIH
– Establish public databases
– Research in computational biology
– Develop software tools for sequence analysis
– Disseminate biomedical information
Web Access: www.ncbi.nlm.nih.gov
Coding sequences: CDS
 CDS is a sequence of nucleotides that corresponds with
the sequence of amino acids in a protein. A
typical CDS starts with ATG and ends with a stop codon.
 The identification of coding sequences (CDS) is an
important step in the functional annotation of genes.
 complete CDS" means the presence of a start codon
"ATG" and a stop codon "TAA/TGA/TAG".
 partial" means there's a stop codon but no start codon
GenBank
GenBank (Genetic Sequence Databank)
GenBank® is the genetic sequence database at the National
Center for Biotechnology Information (NCBI).
It was established in the year 1982 and now maintained by the
National Center for Biotechnology (NCBI).
It contains publicly available nucleotide sequences
DNA sequences can be submitted to GenBank using several
different methods.
BankIt: Web-based form for submission of a small number of
sequences
Sequin: More appropriate for complicated submissions containing
many sequences
Nucleotide sequence databases
 Genbank
 Perhaps the best known database
 Contains all publically available annotated DNA sequences
 Exchanges data daily with the DNA Data Bank of Japan (DDBJ) and
European Molecular Biology Laboratory (EMBL)
 Contains roughly 179 million sequence entries (Dec 2014)
 In October 2024, GenBank contained 34 trillion base pairs from over 4.7
billion nucleotide sequences and more than 580,000 formally described
species. The database started in 1982 by Walter Goad and Los Alamos
National Laboratory.
 Prior submission of sequence into Genbank/DDBJ/EMBL is a prerequisite for
publishing new sequence in any scientific journal
 Submission is easy and can be done electronically
 Each entry has a unique id known as the “Accession Number (AN)”
Structure of Genbank
A detailed structure of a
nucleotide sequence file format
in this database includes the
following:
 1. Locus: This can be defined as
a title given by GenBank itself to
name the sequence entry. It
includes the following:
 a. Locus Name: Similar to
accession number for the
sequence.
 b. Sequence Length: Tells the
number of bases existing in the
sequence.
Conti….
 c. Molecule-Type: Identifies the
type of nucleic acid sequence.
The various types are mRNA
(which is present as cDNA),
rRNA, snRNA, and DNA.
 d. GB Division: Postulates class
of the data according to
classification criteria of
GenBank.
 e. Modification Date: The date
on which the record was
modified.
 2. Definition: This denotes the
name of the nucleotide sequence.
 3. Accession: This covers
accession number, accession
version, and GI number.
 Accession number can be defined
as the unique identifier associated
with each nucleotide sequence
present in the database.
 4. VERSION - Identification number
assigned to a single, specific
sequence in the database. This
number is in the format
“accession.version.”
 5. GI Also a sequence
identification number. Whenever a
sequence is changed, the version
number is increased and a new GI
is assigned.
 6. Keyword: Defined words
that were used to index the
entries.
 7. The Source: This describes
organism from which
sequences have been
obtained.
 8. Organism - The scientific
name (usually genus and
species) and phylogenetic
lineage
 9. REFERENCE - Citations of
publications by sequence
authors, the journal from
which with the sequence was
derived
 10. Features: These
consist of the
information derived
from the sequence
such as biological
source,
 exon,
 intron,
 promoters,
 CDS
 alternate splice,
 Base Count,
 Origin
Growth of Genbank
(1982-2014)
Accession number
 A unique identifier of each record in the database

 Usually alpha-numeric in nature

Why do we need accession
numbers?
 Common names lead to non-specific results
 A search on “Cytochrome” will output many different types of
cytochromes (a, b, c, and others)

 Cannot distinguish among species

 Search on “Insulin” will return insulin sequences from many
organisms
Example Genbank entry
How to search Genbank?
 Can be performed via NCBI
online system Entrez

 Search could be general or

specific using keywords

 For example, Homo sapiens

[ORGN] AND 3260:3270
[SLEN:seq length]
Sister databases to Genbank
1. EMBL
 Sequence provider for Europe
 Maintained at European Bioinformatics Institute (EBI)

2. DDBJ
 Sequence provider for Asia
 Operated by the Center for Information Biology (CBI) in Japan

 The three database are part of the International Nucleotide Sequence

Database Collaboration
 Sync their data every 24 hours
 A query of three databases separately is therefore unnecessary
 Differ slightly in format and representation
European Molecular Biology
Laboratory (EMBL)
The EMBL Nucleotide Sequence Database is maintained by EBI,
UK
It was formed in the year 1974
It develops and maintains a large number of databases, and
scientists can access the data free of cost.
This database serves as the primary source of nucleotide
sequences for Europe.
in this database, the nucleotide sequence data generated by
large-scale genome-sequencing projects and those available from
the European Patent Office can be submitted
Conti…
Data collection is done in collaboration with GenBank (USA) and
the DNA Database of Japan (DDBJ).
The other genomic databases held at EBI are
Ensembl (a database of genome annotation)
Genome Reviews.
The daily releases of the database contain new submissions and
updated sequence data
while every 3 months the entire database is released.
DDBJ
DDBJ: DNA Data Bank of Japan Is a biological database that
collects DNA sequences submitted by researchers.
 It is run by the National Institute of Genetics, Japan.
DDBJ Flat File Format
The data submitted in DDBJ is managed and retrieved according
to the DDBJ format (flat file).
The flat file includes the sequence and the information of who
submitted the data, references, source organisms, and
information about the feature, etc
Ensembl Genome Database
Ensembl is one of several well known genome browsers for the
retrieval of genomic information from several organisms
including human, plants, bacteria and animals.
Created and maintained by the EBI and the Sanger Center (UK)
The obvious problem with manually
curating the database?

Difficult to keep pace with amount of sequence data

generated these days. Necessary to supplement with an
automatic alternative
Protein Databases
Swiss-Prot
 A protein sequence database which strives to provide a high level
of annotation:
* the function of a protein
* domains structure
* post-translational modifications
* variants
 Complete, Curated, Non-redundant and cross-referenced with 34
other databases
its repository contains the amino acid sequence, the protein name
and description, taxonomic data, and citation information
Protein sequence databases -
SwissProt
 A collection of annotated protein sequences

 Operated by the Swiss Institute of

Bioinformatics (SIB)

 Manually curated by a specialist and

verified from literature

 High quality database, gold standard for

protein annotation
 The gold standard for the accurate
determination of the total protein content
in complex samples is the total amino acid
analysis (AAA). During AAA, the intact (Amos Bairoch, creator of SwissProt
proteins are hydrolyzed to individual amino
acids usually by acidic hydrolysis. during his Ph.D, 1986)
Conti…

TrEMBL: TrEMBL (translation of EMBL nucleotide sequence database)

was introduced by the European Bioinformatics Institute in collaborating
with Swiss-Prot
• Created in 1996 as a computer annotated supplement to SWISS-PROT.
• Contains translations of all coding sequences (CDS) in EMBL.
PIR: The Protein Information Resource (PIR) is an integrated public
bioinformatics resource that supports genomic and proteomic research
and scientific studies
The PIR serves the scientific community through on-line access, and
performing off-line sequence identification services for researchers.
It is a database of freely accessible protein sequences which contains
high-quality data and functional information for the proteins
TrEMBL
 Translated EMBL

 Contains all translations of the EMBL nucleotide

database that have not yet been verified by the
SwissProt specialists

 Completely automatic so less authentic source of

information
UniprotKB
Protein Information Resource (PIR)
 Located at Georgetown University Medical Center, USA

 Public bioinformatics resource to support genomic and

proteomic research

 Established in 1984 by the National Biomedical Research

Foundation

 The NBRF previously compiled a comprehensive collection of

sequences in the “Atlas of protein sequence and structure”
(edited by Dr. Margaret Dayhoff, 1965-1978)
Universal Protein Resource (Uniprot)
 Unites the information in three databases, Swissprot,
TrEMBL, and PIR

 Consists of three parts

1. UniprotKB – based on Swissprot and TrEMBL and is a
comprehensive directory of protein annotations

2. Uniref – allows for fast similarity searches such as search for

sequences that are 90% identical

3. Uniprot Archive – collection of Uniprot sequences and their history

Composite Databases
Composite Databases
are collections of several primary database resources.
provide users with various tools and software for analysis of data.
NCBI being a composite database has stored a lot of sequence of
nucleotide and protein within its server and thereby suffers from
high redundancy in the data deposited
Composite DB

 As there are many db which one to search? Some are

good in some aspects and weak in others?
 Composite db is the answer – which has several db for
its base data
 Search on these db is indexed and streamlined so that
the same stored sequence is not searched twice in
different db
Secondary db
 Store secondary structure info or results of searches of
the primary db

Compo DB Primary Source

PROSITE SWISS-PROT
PRINTS OWL
Secondary databases
Secondary Databases
Secondary Databases:
contain information derived from primary
databases.
store information such as conserved sequences,
active site residues, and signature sequences.
Protein Databank data is stored in secondary
databases. Examples include:
 Class Architecture Topology Homology (CATH),
 Kyoto Encyclopedia of Genes and Genomics
(KEGG),
 Protein Families (Pfam)
 and Structural Classification of Proteins (SCOP)
PFAM
A database of protein families, Pfam contains
annotations as well as multiple sequence
alignments generated using hidden Markov models
PROSITE
 Sometimes a newly sequenced protein gives no hits to sequence
databases

 How do we determine its function then?

“In some cases, the structure and function of an unknown protein

which is too distantly related to any protein of known structure to
detect its affinity by overall sequence alignment may be identified by
its possession of a particular cluster of residues types classified as a
motifs. The motifs, or templates, or fingerprints, arise because of
particular requirements of binding sites that impose very tight
constraint on the evolution of portions of a protein sequence” - A. M.
Lesk, 1988
Contd.
 Patterns are inferred from
multiple sequence
alignments

 Look for regions that are

conserved in evolution

 Could be important
binding sites, attachment
sites or catalytic sites
Biological databases
Biological databases can be broadly classified in to
Sequence database
structure database
and pathway databases.
Sequence databases are applicable to both nucleic acid
sequences and protein sequences, whereas structure databases
are applicable to only Proteins.
Sequence databases

Sequence databases
Nucleotide and protein sequence databases
represent the most widely used and some of the
best established biological databases.
serve as repositories for wet lab results and the
primary source for experimental results.
Major public data banks included in this type are
GenBank in USA,
EMBL (European Molecular Biology Laboratory)
in Europe
and DDBJ (DNADataBank) in Japan
Conti….
And protein databases includes
ExPaSy
UniProt
PIR
PDB
Swiss-Prot
TrEMBL
Advantages of Databases
provides information on
genomic context of genes,
Gene homologues, and paralogues,
RNA transcripts from the given genes,
peptide sequences, and
functions of gene families.
It allows access to complete genome sequences available in the
database.
Structure databases
There are many structural database that include
Protein Data Bank (PDB)
Important in solving real problems in molecular biology
PDB Established in 1972 at Brookhaven National Laboratory
(BNL)
It contains structural information of the macromolecules
determined by X-ray, crystallographic, NMR methods
PDB is maintained by the Research Collaboratory for Structural
Bioinformatics (RCSB).
Conti…
PROSITE: is a database of protein domains and families.
PROSITE contains biologically significant sites, patterns and
profiles that help to reliably identify to which known protein
family a new sequence belongs.
CATH: The CATH database (Class, architecure, topology,
homologous superfamily) is a hierarchical classification of protein
domain structures, which clusters proteins at four major
structural levels.
Pathway databases
Pathway databases
A pathway database (DB) is a DB that describes biochemical
pathways, reactions, and enzymes
Some examples of the pathway databases are
KEGG (The Kyoto Encyclopedia of Genes and Genomes)
BRENDA,
Biocyc.
Conti…

KEGG: The Kyoto Encyclopedia of Genes and Genomes (KEGG)

is the primary resource for the Japanese Genome Net service
it is a collection of online databases dealing with genomes,
enzymatic pathways, and biological chemicals
KEGG contains three databases: PATHWAY, GENES, and LIGAND.
The PATHWAY database stores computerized knowledge on
molecular interaction networks.
The GENES database contains data concerning sequences of
genes and proteins generated by the genome projects.
The LIGAND database holds information about the chemical
compounds and chemical reactions that are relevant to cellular
processes.
Conti…
BioCyc: The BioCyc Database Collection is a compilation of
pathway and genome information for different organisms.
It includes two other databases,
 EcoCyc which describes Escherichia coli K-12;
 MetaCyc, which describes pathways for more than 300
organisms.

The Evolution of Bioinformatics - ppt2x
No ratings yet
The Evolution of Bioinformatics - ppt2x
10 pages
Fundamentals of Biostatistics 8th Edition TEXTBOOK PDF
0% (1)
Fundamentals of Biostatistics 8th Edition TEXTBOOK PDF
11 pages
Biological Databases Lec 2,3
No ratings yet
Biological Databases Lec 2,3
49 pages
Database
No ratings yet
Database
40 pages
Introduction To Databases
No ratings yet
Introduction To Databases
29 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
Seminar Bioinformatics
No ratings yet
Seminar Bioinformatics
13 pages
Databases Class Work
No ratings yet
Databases Class Work
48 pages
Lecture 5-6 - Databases NR
No ratings yet
Lecture 5-6 - Databases NR
35 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
Biological Databases Overview
No ratings yet
Biological Databases Overview
16 pages
CH12
No ratings yet
CH12
8 pages
Biological Database
No ratings yet
Biological Database
8 pages
Databases Bioinformatics
No ratings yet
Databases Bioinformatics
42 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
M Lec 01 & 02 Biological Database
No ratings yet
M Lec 01 & 02 Biological Database
50 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
23 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Unit II Bioinformatics
No ratings yet
Unit II Bioinformatics
25 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Biological Databases ODL
No ratings yet
Biological Databases ODL
31 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Biol BDs Singapore
No ratings yet
Biol BDs Singapore
24 pages
Essential Info Notes-1
No ratings yet
Essential Info Notes-1
57 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
Bio in For Matics
No ratings yet
Bio in For Matics
26 pages
Introduction To Databases
No ratings yet
Introduction To Databases
21 pages
Bioinformatics for Plant Scientists
No ratings yet
Bioinformatics for Plant Scientists
28 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
105 pages
Fat Noews
No ratings yet
Fat Noews
37 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
52 pages
Biological Data and Database
No ratings yet
Biological Data and Database
13 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Bioinformatics Lab Notebook: Comsats University, Islamabad
No ratings yet
Bioinformatics Lab Notebook: Comsats University, Islamabad
27 pages
Database
No ratings yet
Database
16 pages
BCH 505 Bioinformatics 3 (2 2) Databases
No ratings yet
BCH 505 Bioinformatics 3 (2 2) Databases
17 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Bioinformatics Databases Explained
No ratings yet
Bioinformatics Databases Explained
5 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
Lecture 2 Introduction To The Computational Tools
No ratings yet
Lecture 2 Introduction To The Computational Tools
15 pages
System Biology Assignment
No ratings yet
System Biology Assignment
17 pages
NT Seq Database
No ratings yet
NT Seq Database
4 pages
Capture D'écran . 2023-03-14 À 00.15.22
No ratings yet
Capture D'écran . 2023-03-14 À 00.15.22
54 pages
Pharmacogenomics 002A Kashyap MK 06-09-2020
No ratings yet
Pharmacogenomics 002A Kashyap MK 06-09-2020
93 pages
Mathano Dukhavo
No ratings yet
Mathano Dukhavo
105 pages
Chapter 1: Genbank: The Nucleotide Sequence Database: Ilene Mizrachi
No ratings yet
Chapter 1: Genbank: The Nucleotide Sequence Database: Ilene Mizrachi
14 pages
Bioinformatics Biological Database
No ratings yet
Bioinformatics Biological Database
31 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Nucleic Acid Databases
No ratings yet
Nucleic Acid Databases
37 pages
Bioinformatics Lab Guide
No ratings yet
Bioinformatics Lab Guide
29 pages
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
No ratings yet
Biological Sequence Databases: A. National Center For Biotechnology Information (NCBI)
41 pages
Bioinfi U3 Part - 1
No ratings yet
Bioinfi U3 Part - 1
4 pages
Biologicaldatabase 190402034501
No ratings yet
Biologicaldatabase 190402034501
26 pages
CMSC 838T - Lecture 9: Bioinformatics Databases
No ratings yet
CMSC 838T - Lecture 9: Bioinformatics Databases
65 pages
Biological - Databases Class Work 60
No ratings yet
Biological - Databases Class Work 60
60 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Probability Distributions Sampling Distribution
No ratings yet
Probability Distributions Sampling Distribution
13 pages
ISA Action Plan Final Shahpu Sadar
No ratings yet
ISA Action Plan Final Shahpu Sadar
17 pages
GGHS 30NB-IsA Action Plan Template 2017-18
No ratings yet
GGHS 30NB-IsA Action Plan Template 2017-18
23 pages
Immu o Globulin
No ratings yet
Immu o Globulin
46 pages
Action Plan-Resubmitt 5 Dec 18
No ratings yet
Action Plan-Resubmitt 5 Dec 18
16 pages
8 Dec Apa Government High School No. 1 Sillanwali
No ratings yet
8 Dec Apa Government High School No. 1 Sillanwali
2 pages
6 ISA Action Plan Template (Revised)
No ratings yet
6 ISA Action Plan Template (Revised)
15 pages
Action Plan (Govt.z.s.i Hs Meta
No ratings yet
Action Plan (Govt.z.s.i Hs Meta
19 pages
Measures of Central Tendency and Dispersion Measure of Central Tendency
No ratings yet
Measures of Central Tendency and Dispersion Measure of Central Tendency
8 pages
APA Chak 27 68 Sargodha
No ratings yet
APA Chak 27 68 Sargodha
2 pages
6 ISA Action Plan of GHS Samanabad
No ratings yet
6 ISA Action Plan of GHS Samanabad
11 pages
6 ISA Action Plan Template 2017-18
No ratings yet
6 ISA Action Plan Template 2017-18
21 pages
Hypersensitivity
No ratings yet
Hypersensitivity
36 pages
Hybridoma Technique
No ratings yet
Hybridoma Technique
33 pages
Immune System Organs
No ratings yet
Immune System Organs
37 pages
Bioinformatics Practicals
No ratings yet
Bioinformatics Practicals
10 pages
TLM2024 Sportstrainercourseoutline
No ratings yet
TLM2024 Sportstrainercourseoutline
32 pages
Bioinf Lecture1-2
No ratings yet
Bioinf Lecture1-2
44 pages
Inflammation 1
No ratings yet
Inflammation 1
28 pages
Terms 333
No ratings yet
Terms 333
18 pages
Microbial Recognition
No ratings yet
Microbial Recognition
11 pages
PNL1
No ratings yet
PNL1
6 pages
EcoPlant Business Plan PKR Tables
No ratings yet
EcoPlant Business Plan PKR Tables
3 pages
Estrous Cycle
No ratings yet
Estrous Cycle
11 pages
General Science-5 Compressed
No ratings yet
General Science-5 Compressed
165 pages
Biology Curriculum
No ratings yet
Biology Curriculum
8 pages
Untitled Design
No ratings yet
Untitled Design
18 pages
Untitled Design
No ratings yet
Untitled Design
1 page
04.measure of Disperson
No ratings yet
04.measure of Disperson
17 pages
Biometry Lecture 1
No ratings yet
Biometry Lecture 1
20 pages
Community Medicine: Catalogue
No ratings yet
Community Medicine: Catalogue
6 pages
04c53f295aff79 - Author-Wise Price List (July 2010)
No ratings yet
04c53f295aff79 - Author-Wise Price List (July 2010)
15 pages
Assignment2 22BI13350
No ratings yet
Assignment2 22BI13350
6 pages
Computational Biology A Hypertextbook 1st Edition Scott Theodore Kelley Digital Version 2025
100% (2)
Computational Biology A Hypertextbook 1st Edition Scott Theodore Kelley Digital Version 2025
69 pages
Balamurugan
No ratings yet
Balamurugan
17 pages
Database Search, Alignment Viewer and Genomics Analysis Tools: Big Data For Bioinformatics
No ratings yet
Database Search, Alignment Viewer and Genomics Analysis Tools: Big Data For Bioinformatics
12 pages
Introduction To Statistics... PPT Rahul - PPSX
100% (1)
Introduction To Statistics... PPT Rahul - PPSX
41 pages
Nonclinical Statistics For Pharmaceutical and Biotechnology Industries 1st Edition Lanju Zhang Eds Instant Download
No ratings yet
Nonclinical Statistics For Pharmaceutical and Biotechnology Industries 1st Edition Lanju Zhang Eds Instant Download
81 pages
BioInformatics Assignment 1
No ratings yet
BioInformatics Assignment 1
7 pages
Fhca Notes
No ratings yet
Fhca Notes
45 pages
Final Biostatistics Lecture Notes
No ratings yet
Final Biostatistics Lecture Notes
71 pages
EU MDR Part 1 - FINAL
No ratings yet
EU MDR Part 1 - FINAL
11 pages
Application Note - Whole Genome
No ratings yet
Application Note - Whole Genome
3 pages
Regression Modeling in Biostatistics
No ratings yet
Regression Modeling in Biostatistics
3 pages
Bioinformatics for Advanced Learners
No ratings yet
Bioinformatics for Advanced Learners
13 pages
Writing Dissertation and Grant Proposals Epidemiology Preventive Medicine and Biostatistics
100% (1)
Writing Dissertation and Grant Proposals Epidemiology Preventive Medicine and Biostatistics
4 pages
Fundamental Concepts For New Clinical Trialists - 1st Edition Complete Ebook Edition
100% (14)
Fundamental Concepts For New Clinical Trialists - 1st Edition Complete Ebook Edition
15 pages
Master of Public Health Student Handbook
No ratings yet
Master of Public Health Student Handbook
37 pages
Hsslive-Xi-1. Statistics - Scope and Development
No ratings yet
Hsslive-Xi-1. Statistics - Scope and Development
6 pages
Medical Test Accuracy Explained
No ratings yet
Medical Test Accuracy Explained
12 pages
Genetic Engineering & Bioinformatics
No ratings yet
Genetic Engineering & Bioinformatics
3 pages
PDBefold Tutorial
No ratings yet
PDBefold Tutorial
14 pages
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
No ratings yet
Laboratory Manual: Bioinformatics Laboratory (For Private Circulation Only)
52 pages
A Comparison of Six Methods For Missing Data Imputation 2155 6180 1000224 PDF
No ratings yet
A Comparison of Six Methods For Missing Data Imputation 2155 6180 1000224 PDF
6 pages
Research Article
No ratings yet
Research Article
10 pages
Target Population Thesis
100% (2)
Target Population Thesis
6 pages

Lecture3 4

Uploaded by

Lecture3 4

Uploaded by

Introduction to

 Easy and quick retrieval of information

 Provide backup support

 Sequence info is stored in databases

Fundamental to the survival of science

Each year, Nucleic Acids Research (NAR) journal

 Nucleotide sequence databases

 Protein sequence databases

Created in 1988 as a part of the

 Usually alpha-numeric in nature

 Cannot distinguish among species

 Search could be general or

 For example, Homo sapiens

 The three database are part of the International Nucleotide Sequence

Difficult to keep pace with amount of sequence data

 Operated by the Swiss Institute of

 Manually curated by a specialist and

 High quality database, gold standard for

TrEMBL: TrEMBL (translation of EMBL nucleotide sequence database)

 Contains all translations of the EMBL nucleotide

 Completely automatic so less authentic source of

 Public bioinformatics resource to support genomic and

 Established in 1984 by the National Biomedical Research

 The NBRF previously compiled a comprehensive collection of

 Consists of three parts

2. Uniref – allows for fast similarity searches such as search for

3. Uniprot Archive – collection of Uniprot sequences and their history

 As there are many db which one to search? Some are

Compo DB Primary Source

 How do we determine its function then?

“In some cases, the structure and function of an unknown protein

 Look for regions that are

KEGG: The Kyoto Encyclopedia of Genes and Genomes (KEGG)

You might also like