What is Bioinformatics/ Computational
Biology?
Bioinformatics: collection and storage of
biological information
Computational biology: development of
algorithms and statistical models to analyze
biological data
Bioinformatics/Computational Biology will be
interchanged
What is Bioinformatics?
Source: http://ccb.wustl.edu/
Why is bioinformatics in demand?
Very few people adequately trained in both biology
and computer science
Genome sequencing, microarrays, etc lead to large
amounts of data to be analyzed
Leads to important discoveries
Saves time and money
What skills are needed?
Well-grounded in one of the following areas:
Computer science
Molecular biology
Statistics
Working knowledge and appreciation in the others!
Brief history of bioinformatics: Databases
• The first biological database - Protein Identification Resource
was established in 1972 by Margaret Dayhoff
• Dayhoff and co-workers organized the proteins into families
and superfamilies based on degree of sequence similarity
• Idea of sequence alignment was introduced as well as special
tables that reflected the frequency of changes observed in the
sequences of a group of closely related proteins
• Currently there are several huge Protein Banks : SwissProt,
PIR International, etc.
• The first DNA database was established in 1979. Currently
there are several powerful databases: GenBank, EMBL, DDBJ,
etc.
Brief history of bioinformatics: other
important steps
• Development of sequence retrieval methods (1970-80s)
• Development of principles of sequence alignment (1980s)
• Prediction of RNA secondary structure (1980s)
• Prediction of protein secondary structure and 3D (1980-90s)
• The FASTA and BLAST methods for DB search (1980-90s)
• Prediction of genes (1990s)
• Studies of complete genome sequences (late 1990s –2000s)
Where Can You Learn More?
ISCB: http://www.iscb.org/
NBCI: http://ncbi.nlm.nih.gov/
http://www.bioinformatics.org/
Journals (Journal of Computational Biology,
Bioinformatics, BMC Bioinformatics,…)
Conferences (ISMB, RECOMB, PSB, InCoB,…)
Data Mining & Bioinformatics : Why?
Many biological processes are not well-understood
Biological knowledge is highly complex, imprecise, descriptive, and
experimental
Biological data is abundant and information-rich
Genomics & proteomics data (sequences), microarray and protein-
arrays, protein database (PDB), bio-testing data
Huge data banks, rich literature, openly accessible
Largest and richest scientific data sets in the world
Mining: gain biological insight (data/information knowledge)
Mining for correlations, linkages between disease and gene sequences,
protein networks, classification, clustering, outliers, ...
Find correlations among linkages in literature and heterogeneous
databases
Algorithms Used in Bioinformatics
Comparing sequences: Comparing large numbers of long sequences, allow
insertion/deletion/mutations of symbols
Constructing evolutionary (phylogenetic) trees: Comparing seq. of diff.
organisms, & build trees based on their degree of similarity (evolution)
Detecting patterns in sequences
◦ Search for genes in DNA or subcomponents of a seq. of amino acids
Determining 3D structures from sequences
◦ E.g., infer RNA shape from seq. & protein shape from amino acid seq.
Inferring cell regulation:
◦ Cell modeling from experimental (say, microarray) data
Determining protein function and metabolic pathways: Interpret human
annotations for protein function and develop graph db that can be queried
Assembling DNA fragments (provided by sequencing machines)
Using script languages: script on the Web to analyze data and applications