Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Bioinformatics
Lecture 1
Sources of
Biological
Data
Course Plan
Muhammad Usman Ghani Khan
UET Lahore
Outline
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
1 Introduction
Definitions
Related Fields
The New Biology
Motivation and Background
Course Plan
2 Sources of Biological Data
3 Course Plan
Definitions
over 43,000 definitions available on internet
Definition 1: Bioinformatics is the application of computer
technology to the management and analysis of biological
data1
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Definition 2: Biologists doing stuff with computers?
Sources of
Biological
Data
Definition 3: The design, construction and use of software
tools to generate, store, annotate, access and analyse data
and information relating to Molecular Biology
Course Plan
* Here we consider the use of Bioinformatics tools rather
than their design and construction
* Here we consider the access and analysis of data and
information items rather than their generation, storage or
annotation
1
European Bioinformatics Institute (EBI)
Definitions
Every application of computer science to biology
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
* Sequence analysis, images analysis, sample management,
population modeling,
Analysis of data coming from large-scale biological
projects
* Genomes, transcriptomes, proteomes, metabolomes, etc
Solving biological problems with computation?
Collecting, storing and analysing biological data?
Informatics - library science?
But: I do not think all biological computing is
bioinformatics, e.g. mathematical modelling is not
bioinformatics, even when connected with biology-related
problems. In my opinion, bioinformatics has to do with
management and the subsequent use of biological
information, particular genetic information. Richard
Durbin
Definitions
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
What is not bioinformatics?
* Biologically-inspired computation, e.g., genetic algorithms
and neural networks
* However, application of neural networks to solve some
biological problem, could be called bioinformatics
* What about DNA computing?
Related Fields
Computational biology Application of computing to
biology (broad definition)
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
* Often used interchangeably with bioinformatics
Biometry: the statistical analysis of biological data
Sources of
Biological
Data
Biophysics: An interdisciplinary field which applies
techniques from the physical sciences to understanding
biological structure and function2
Course Plan
Mathematical biology tackles biological problems, but the
methods it uses to tackle them need not be numerical and
need not be implemented in software or hardware.
British Biophysical Society
Related Fields
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Computational biology and bioinformatics overlap; both use
computational techniques to try to understand biological
phenomena; but comp biol has more of an emphasis on
mathematical modelling to explain biological mechanisms,
whereas bioinformatics has more to do with the storage and
synthesis of experimental data (eg. pattern recognition and
data mining).
New Biology
Traditional Biology
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Small team working on a specialized topic
Well defined experiment to answer precise questions
New high-throughput biology
Course Plan
Large international teams using cutting edge technology
defining the project
Results are given raw to the scientific community without
any underlying hypothesis
Examples of High Throughput
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Complete genome sequencing
Simultaneous expression analysis of thousands of genes
(DNA microarrays, SAGE)
Large-scale sampling of the proteome
Protein-protein analysis large-scale 2-hybrid (yeast, worm)
Large-scale 3D structure production (yeast)
Metabolism modeling
Biodiversity
Motivation
Rapid growth of biological related data
explosion of publicly available biological materials
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
* Modern molecular biology and especially genomics has led
to vast quantities of data: DNA/ protein sequence, gene
expression.
* This mainly consists of vast strings/ matrices of letters/
numbers, which in their raw form are not very interesting.
Management problem:
how to handle this data?
* Analysis
* Understand
* Presentation
Approaches:
* Computing techniques are very good for extracting useful
patterns.
* Boinformatics consists of methods to remove these issues.
Motivation
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
In order to extract useful information, it is necessary to
understand biological principles involved.
In this course we will introduce some basic molecular
biology/ genomics and look at ways in which computers
can be used to analyse it.
Motivation
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Sample Ultimate Problems
What is the role of a particular gene?
Does a particular gene help cause a disease?
How does a drug affect a cell?
Can we insert a gene into corn to protect it against
diseases or pests?
Can we design a drug to accomplish a particular purpose?
Can we build a cell that eats pollution?
Motivation
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Why would a student choose this course?
To prepare for graduate study in Bioinformatics or
Computational Biology.
To prepare for certain jobs in the pharmaceutical or
biotechnology industries. The future is hard to predict.
There are jobs related to high-tech agriculture (new
varieties of plants), industrial organisms, biofuels,
pharmaceuticals (designer drugs).
Outline
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
1 Introduction
Definitions
Related Fields
The New Biology
Motivation and Background
Course Plan
2 Sources of Biological Data
3 Course Plan
So what data can we generate?
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Biological data can be generated at many different levels
Genomics (DNA)
Transcriptomics (RNA)
Proteomics (proteins)
Metabolomics (small compounds)
Lipidomics (lipids)
Hundreds of omics have been catalogued
How an omics dataset looks like?
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
In most cases datasets present a similar structure
Each sample is characteristed by a large number of
variables (RNA, Proteins, lipids, etc.)
Each variable indicates (usually quantitatively) the
presence of that element in the sample
Due to the high cost of most omics technologies, variables
are much more then samples
* Problems of over-fitting
Research Areas
Genome-scale) Sequence Analysis
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
* Sequence alignments, motif discovery, genome-wide
association (to study diseases such as cancers)
Computational Evolutionary Biology
* Phylogenetics, evolution modeling
Analysis of Gene Regulation
* Gene expression analysis, alternative splicing, protein-DNA
interactions, gene regulatory networks
Structural Biology
* Drug discovery, protein folding, protein-protein interactions
Synthetic Biology
High throughput Imaging Analysis
Outline
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
1 Introduction
Definitions
Related Fields
The New Biology
Motivation and Background
Course Plan
2 Sources of Biological Data
3 Course Plan
Course Contents
Lecture 1
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Introduction, Definitions.
Applications, Scope, Motivation.
Lecture 2
Molecular biology Introduction
Course Plan
Structure of DNA, RNA, Proteins
Announcement of term projects
Lecture 3
Bioinformatics Databases; Genbank, ENBL, Prot etc.
Practical demonstration of databanks and their structures.
Course Contents
Lecture 4
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Database Formats; Fasta, seq, Data
Quiz 1
Lecture 5
Sequence Alignment Sequence Motifs; Gene Finding
Practical demonstration of BioJava/.NetBio tools for
biological related tasks
Lecture 6
Sequence Alignment (Part 2)
Computing with Biological Structures
Course Contents
Lecture 7
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Phylogenetic Algorithms
Lecture 8
Mid-term break
Lecture 9
Microarray Data Analysis
Lecture 10
Term project presentations and discussion
Course Contents
Lecture 11
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Comparative Genomics
Lecture 12
Proteomics
Lecture 13
Biological Ontologies; Biological Text Mining
Lecture 14
Genetic Networks
Lecture 15
Final Viva and term project submissions
Term Project Ideas
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Architectures and data management techniques for the life
sciences
Query processing and optimization for biological data
Biological data sharing and update propagation
Query formulation assistance for scientists
Modeling of life sciences data
Biomedical data integration issues in eScience
Laboratory information management systems in biology
(including workflow systems)
Quality assurance in integrated data repositories
Biomedical metadata management (including provenance)
Mining integrated life sciences data and text resources
Standards for biomedical data integration and annotation
Scientific results arising from innovative data integration
solutions
Term Project Ideas
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Exposing biomedical data for integration purposes (APIs,
Linked Open Data, SPARQL endpoints)
Creation and use of clinical data repositories
Data integration in clinical and translational research
Integration of genotypic and phenotypic data
Challenges and opportunities with big data in the life
sciences
Ethical, legal and social issues with biomedical data
integration
Useful Books
Bryan Bergeron M.D: Bioinformatics Computing, Prentice
Hall, 2002 (freely available on internet).
Introduction
Definitions
Related Fields
The New
Biology
Motivation and
Background
Sources of
Biological
Data
Course Plan
Richard C. Deonier, Simon Tavare & Michael S.
Waterman: Computational Genome Analysis an
Introduction, Springer 2005
Some other helpful books
* Alberts et al- Molecular Biology of the Cell
* Stryer- Biochemistry
* Baldi and Brunak Bioinformatics a machine learning
approach
* Durbin, Eddy, Krogh and Mitchison Biological sequence
analysis
* Kanehisa - Post genome informatics
* Lesk- Introduction to bioinformatics
* Orengo, Jones and Thornton - Bioinformatics