0% found this document useful (0 votes)

25 views27 pages

Lecture Bioinfo Databases

Uploaded by

Khushal Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views27 pages

Lecture Bioinfo Databases

Uploaded by

Khushal Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

Who needs to study Bioinformatics?

•What is Bioinformatics?

“Bioinformatics is about searching biological databases, comparing

sequences, looking at protein structures, and (more generally) asking
biological questions with a computer”

•Introduced by French scientist Jean-Michel Claverie in late 80s

(“bioinformatique”)

•Saves you months of work!

Before the era of Bioinformatics

• Only two ways to perform

experiments,

1. In vivo

2. In vitro

• We are now in the age of In

Silico biology!
Bioinformatics is a must do!
Bioinformatics in context
Mathematics/
Genomics computer
science

Molecular
biology Bioinformatics Biophysics

Ethical, legal, and

social implications Molecular
evolution
What does this mean?

• Think of Bioinformatics as a tool!

• Now you are equipped with computational tools to answer biological

questions
The biological foundations of Bioinformatics
• Proteins and Nucleic acids

• Proteins are made up of amino acids

while nucleic acids are made up of
nucleotides

• How best to represent proteins and

nucleic acids?
• Need a formula to describe their
composition
• The identity of the protein is determined
from the composition and the precise
order of amino acids it contains
The Birth of Bioinformatics

• Protein sequences started to accumulate in 1960s

• People started manual comparisons (pre-computer era)

• With the advent of computers, people started to write algorithms from

scratch to analyze “sequence data”

• This was the genesis of bioinformatics

“The holy grail of Bioinformatics”
GCTCCTCACTGTCTGTGTTTATTCTTTTAGCTTCTTCAGA
TCTTTTAGTCTGAGGAAGCCTGGCATGTGCAAATGAAG > 500, 000 genes
TTAACCTAA... sequenced

Expected number of unique

protein structures:
~ 700-1,000
The core of Bioinformatics to date
•Relationships between

TDQAAFDTNIVTLTRFVM
EQGRKARGTGEMTQLLNS
LCTAVKAISTAVRKAGIA
HLYGIAGSTNVTGDQVKK
LDVLSNDLVINVLKSSFA
TCVLVTEEDKNAIIVEPE
KRGKYVVCFDPLDGSSNI
DCLVSIGTIFGIYRKNST
sequence
DEPSEKDALQPGRNLVAA
GYALYGSATMLV

Sequence 3D structure protein functions

•Properties and evolution of genes, genomes, proteins, metabolic

pathways in cells

•Use of this knowledge for prediction, modelling, and design

From sequence to structure

• Proteins adapt a three-

dimensional (3D) structure,
which is functionally important

• Structure is determined by the

composition and order of amino 1. Hydrophobic amino acids (e.g., Valine, Leucine) do not
want to be on the surface
acids in that protein 2. Hydrophilic love to be on the surface to interact with
water (e.g., Serine)
3. Also affected by the electric charge on some residues
and their size
In Short!
• Proteins have a unique order and composition of amino acids, simply
referred to as the ‘sequence’

• Sequence determines the 3D shape of the protein, simply referred to as

the ‘structure’

• Structure determines the molecular activities of proteins, simply referred

to as the ’function’

• Sequence -> Structure -> Function (but not always!)

What about DNA & RNA?
• DNA & RNA are made up of
nucleotide chains

• Nucleotides consist of carbohydrates,

phosphate, and one out of five
nitrogen bases

• Adenine, Guanine, Cytosine,

Thymine, and Uracil or simply A, T,
G, and C
What should be cheaper and faster? DNA/RNA
or protein sequencing?

DNA/RNA sequencing is faster and cheaper simply

because of fewer characters, four nucleotides vs. twenty
What do we mean by complementarity?

T is always facing A, while G is always facing C in one-

to-one reciprocal relationship
How can this knowledge help us?

If we know the sequence of one strand, we can get the

sequence of the other strand
Example
• 5’-ATGCTGA-3’

• What is the complimentary sequence?

• 5’-ATGCTGA-3’
• 3’-TACGACT-5’

• How is this reported?

• 5’-ATGCTGA-3’ and 5’-TCAGCAT-3’
What is a Database?

A database is an organized collection of related information

What are the advantages of using databases?

• Easy and quick retrieval of information

• Provide backup support

Biological Databases
•Need to collect and store biological data and its associated knowledge
into databases

•Fundamental to the survival of science

Two kinds of Biological Databases

1. Primary
• Contain primary sequence information (nucleotide or protein) and associated
annotations

1. Secondary
• Summarize the results from primary databases
Primary Databases

• Nucleotide sequence databases

• Protein sequence databases

Nucleotide Sequence Databases
• Genbank
• Perhaps the best known database
• Contains all publically available annotated DNA sequences
• Exchanges data daily with the DNA Data Bank of Japan (DDBJ) and European
Molecular Biology Laboratory (EMBL)
• Contains roughly 179 million sequence entries (Dec 2014)
• Prior submission of sequence into Genbank/DDBJ/EMBL is a prerequisite for
publishing new sequence in any scientific journal
• Submission is easy and can be done electronically
• Each entry has a unique id known as the “Accession Number (AN)”
Accession number

• A unique identifier of each record in the database

• Usually alpha-numeric in nature

Why do we need accession numbers?
• Common names lead to non-specific results
• A search on “Cytochrome” will output many different types of cytochromes (a,
b, c, and others)

• Cannot distinguish among species

• Search on “Insulin” will return insulin sequences from many organisms
Example Genbank Entry
Secondary Databases
PROSITE
• Sometimes a newly sequenced protein gives no hits to sequence
databases

• How do we determine its function then?

“In some cases, the structure and function of an unknown protein which is
too distantly related to any protein of known structure to detect its affinity
by overall sequence alignment may be identified by its possession of a
particular cluster of residues types classified as a motifs. The motifs, or
templates, or fingerprints, arise because of particular requirements of
binding sites that impose very tight constraint on the evolution of portions
of a protein sequence” - A. M. Lesk, 1988

Biology For Engineers: Press
50% (4)
Biology For Engineers: Press
24 pages
Systematic "A" Level Biology. Basic and Simplified Revision Notes
100% (2)
Systematic "A" Level Biology. Basic and Simplified Revision Notes
458 pages
Instant Notes in Bioinformatics, Richard M Tywman
100% (2)
Instant Notes in Bioinformatics, Richard M Tywman
257 pages
Bioinfo Training Material
No ratings yet
Bioinfo Training Material
42 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
Bio in For Matics
100% (1)
Bio in For Matics
160 pages
Download
No ratings yet
Download
19 pages
Collection
No ratings yet
Collection
8 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
Bioinformatics for Plant Scientists
No ratings yet
Bioinformatics for Plant Scientists
28 pages
Biological Databases Overview
No ratings yet
Biological Databases Overview
16 pages
Into To Bioinfo
No ratings yet
Into To Bioinfo
53 pages
Databases in Bioinformatics - An Introduction
No ratings yet
Databases in Bioinformatics - An Introduction
11 pages
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
No ratings yet
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
24 pages
Lec 01
No ratings yet
Lec 01
93 pages
BIF401 Midterm Short Notes
No ratings yet
BIF401 Midterm Short Notes
45 pages
L2 Proteomics, Genomics and Bioinformatics
No ratings yet
L2 Proteomics, Genomics and Bioinformatics
30 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
52 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
76 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
23 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Lec2 Databases
No ratings yet
Lec2 Databases
135 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
105 pages
Bio in For Matics
No ratings yet
Bio in For Matics
26 pages
Introduction to Bioinformatics Course
No ratings yet
Introduction to Bioinformatics Course
34 pages
Intro to Bioinformatics Course
No ratings yet
Intro to Bioinformatics Course
104 pages
Lecture 01
No ratings yet
Lecture 01
20 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
PB Bioinfo L1 2023
No ratings yet
PB Bioinfo L1 2023
21 pages
Bioin
No ratings yet
Bioin
34 pages
Cell Division: Mitosis and Meiosis Guide
100% (2)
Cell Division: Mitosis and Meiosis Guide
49 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
NEET Plant Respiration Quiz
100% (1)
NEET Plant Respiration Quiz
26 pages
Exercise No. 5 Test For Fats and Proteins I
No ratings yet
Exercise No. 5 Test For Fats and Proteins I
5 pages
M.Sc Biochemistry Handbook 2021-23
No ratings yet
M.Sc Biochemistry Handbook 2021-23
126 pages
Capture D'écran . 2023-03-14 À 00.15.22
No ratings yet
Capture D'écran . 2023-03-14 À 00.15.22
54 pages
Bioinformatics Database and Applications
100% (3)
Bioinformatics Database and Applications
82 pages
Selected Topic in Cs 1
No ratings yet
Selected Topic in Cs 1
53 pages
Truncated Protein SDS-PAGE Analysis
No ratings yet
Truncated Protein SDS-PAGE Analysis
2 pages
Fat Noews
No ratings yet
Fat Noews
55 pages
Biological Database 1
No ratings yet
Biological Database 1
50 pages
Index: Index Terms Links A
No ratings yet
Index: Index Terms Links A
25 pages
School of Mathematics and Natural Sciences: Irina - Borovkov@utdallas - Edu
No ratings yet
School of Mathematics and Natural Sciences: Irina - Borovkov@utdallas - Edu
4 pages
Energy Balance On Separator (Correction)
No ratings yet
Energy Balance On Separator (Correction)
4 pages
Biology Molecules and Reactions Quiz
No ratings yet
Biology Molecules and Reactions Quiz
6 pages
1.3.1 - Differences Between Eukaryotes and Prokaryotes
No ratings yet
1.3.1 - Differences Between Eukaryotes and Prokaryotes
4 pages
Fluid Mosaic Model
No ratings yet
Fluid Mosaic Model
4 pages
Latihan Bab 3
No ratings yet
Latihan Bab 3
12 pages
Bioinformatics PPT Section B Data Storage and Retrival Group 3
No ratings yet
Bioinformatics PPT Section B Data Storage and Retrival Group 3
36 pages
Transfection Manual - 15MAR07
No ratings yet
Transfection Manual - 15MAR07
63 pages
Cox2e TB 04
100% (1)
Cox2e TB 04
15 pages
Introduction To Prokaryotic Genetics - Transformation
No ratings yet
Introduction To Prokaryotic Genetics - Transformation
24 pages
Practice - DNA, RNA, Gene Expression (Solutions)
No ratings yet
Practice - DNA, RNA, Gene Expression (Solutions)
4 pages
CELL Words (Compilation Got From Diff. Pages)
No ratings yet
CELL Words (Compilation Got From Diff. Pages)
14 pages
610-1425 ACE Alera Methodology & Procedures Rev 17 1014
No ratings yet
610-1425 ACE Alera Methodology & Procedures Rev 17 1014
2 pages
Melon HerbaZest
No ratings yet
Melon HerbaZest
1 page
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Gmo Steps
No ratings yet
Gmo Steps
7 pages
Chromosome Structure
No ratings yet
Chromosome Structure
8 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Biotech Time Line
100% (1)
Biotech Time Line
34 pages
F21 - BIOL 1020 Lab 5 Handout
No ratings yet
F21 - BIOL 1020 Lab 5 Handout
11 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
Lec (1) - Introduction
No ratings yet
Lec (1) - Introduction
41 pages
Development of Biopharmaceutical Parenteral Dosage Forms, 1st Edition Secure Download
100% (16)
Development of Biopharmaceutical Parenteral Dosage Forms, 1st Edition Secure Download
15 pages
Li 等 - 2023 - A comparative atlas of single-cell chromatin acces
No ratings yet
Li 等 - 2023 - A comparative atlas of single-cell chromatin acces
21 pages
IB Biology Theme A Notes
No ratings yet
IB Biology Theme A Notes
16 pages
Unit 1
No ratings yet
Unit 1
24 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Bioinformatics Intro
No ratings yet
Bioinformatics Intro
69 pages
Bioinf Lecture1-2
No ratings yet
Bioinf Lecture1-2
44 pages
All Bio Exams G11Adv From 2021 To 2024 Answers
No ratings yet
All Bio Exams G11Adv From 2021 To 2024 Answers
36 pages
Biological - Databases Class Work 60
No ratings yet
Biological - Databases Class Work 60
60 pages
Lecture3 4
No ratings yet
Lecture3 4
73 pages
Introduction To Databases
No ratings yet
Introduction To Databases
21 pages
Lecture 1 - Biological Database
No ratings yet
Lecture 1 - Biological Database
14 pages
Mock Test 1 Solutions
No ratings yet
Mock Test 1 Solutions
7 pages
Lecture 2
No ratings yet
Lecture 2
24 pages
Unit 1-Information Search and Data Retrieval & Biological Databases and Their Management
No ratings yet
Unit 1-Information Search and Data Retrieval & Biological Databases and Their Management
12 pages
Introduction To Bioinformatics 2ed Edition Lesk A.M. Available Instanly
No ratings yet
Introduction To Bioinformatics 2ed Edition Lesk A.M. Available Instanly
130 pages
Biological Databases
No ratings yet
Biological Databases
6 pages

Lecture Bioinfo Databases

Uploaded by

Lecture Bioinfo Databases

Uploaded by

Who needs to study Bioinformatics?

“Bioinformatics is about searching biological databases, comparing

•Introduced by French scientist Jean-Michel Claverie in late 80s

•Saves you months of work!

• Only two ways to perform

• We are now in the age of In

Ethical, legal, and

• Think of Bioinformatics as a tool!

• Now you are equipped with computational tools to answer biological

• Proteins are made up of amino acids

• How best to represent proteins and

• Protein sequences started to accumulate in 1960s

• People started manual comparisons (pre-computer era)

• With the advent of computers, people started to write algorithms from

• This was the genesis of bioinformatics

Expected number of unique

Sequence 3D structure protein functions

•Properties and evolution of genes, genomes, proteins, metabolic

•Use of this knowledge for prediction, modelling, and design

• Proteins adapt a three-

• Structure is determined by the

• Sequence determines the 3D shape of the protein, simply referred to as

• Structure determines the molecular activities of proteins, simply referred

• Sequence -> Structure -> Function (but not always!)

• Nucleotides consist of carbohydrates,

• Adenine, Guanine, Cytosine,

DNA/RNA sequencing is faster and cheaper simply

T is always facing A, while G is always facing C in one-

If we know the sequence of one strand, we can get the

• What is the complimentary sequence?

• How is this reported?

A database is an organized collection of related information

• Easy and quick retrieval of information

• Provide backup support

•Fundamental to the survival of science

• Nucleotide sequence databases

• Protein sequence databases

• A unique identifier of each record in the database

• Usually alpha-numeric in nature

• Cannot distinguish among species

• How do we determine its function then?

You might also like