0% found this document useful (0 votes)

9 views35 pages

Lecture 1

The Bioinformatics course, led by Prof. Chi-Ying Huang and Dr. Tam Tran, covers the integration of biology, computer science, and information technology to enhance biological insights. Key components include attendance, homework, and exams, with a focus on primary databases and applications in molecular biology and diagnostics. Students will engage in practical exercises and assignments related to DNA and protein sequencing, utilizing resources like NCBI and various bioinformatics tools.

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views35 pages

Lecture 1

Uploaded by

trieupg.22bi13431

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Bioinformatics - Session 1

Course Provider: PhD Tam Tran

Bachelor in PMAB & MST– Year 2

Lecturers

Prof. Chi-Ying Huang

Institute of Biopharmaceutical Sciences
National Yang-Ming University, Taiwan
E-mail: [email protected]

Dr. Tam Tran

Department of Life Sciences (LS) – USTH
Email: [email protected]

2
Scoring

1. Attendance (10%)

2. Homework (30%) – submit your homework via USTH Moodle before

10am the day before the next class

3. Exam (60%) - Multiple Choice Exams

1. Attendance (20%) (Note: 3 absences or late/5 classes = 0)

2. Homework (80%) – submit your homework via USTH Moodle before

10am the day before the next class

3
Important Administrative Notes

 You must log in to your USTH Moodle account. Both the homework and
the final are submitted via Moodle.

https://moodle.usth.edu.vn

 All handouts, assignments, lecture slides, and announcements are

posted on Bioinformatics course in USTH Moodle

4
Outline

1. What is Bioinformatics?

2. Primary data

3. Biomedical databases search using NCBI

5
Why Study Bioinformatics?
1. What is Bioinformatics?

Bioinformatics (NCBI definition)

is the field of science in which biology, computer science, and information technology merge to form a
single discipline. The ultimate goal of the field is to enable the discovery of new biological insights as
well as to create a global perspective from which unifying principles in biology can be discerned.
7
Applications of Bioinformatics

Basic molecular biology Clinical/medical diagnostics

Development of pharmaceuticals Agricultural biotechnology

8
Key research in bioinformatics

Sequence bioinformatics Structural bioinformatics

Systems biology
- Integrated analysis of biological pathways
and interacting networks

9
2. Primary databases

10
Primary databases

Labs

Experimental results
 Information stored in the genetic code (nucleotide sequences)
 Protein sequences (amino acid sequences)

11
Primary databases (Raw data)
Nucleic acids databases

Three databases that accept direct submission

Genbank
(NCBI, USA)

ENA
(European DDBJ
Molecular (National Centre
Biology for Genetics)
Laboratories)

Fig. Information stored at GenBank, ENA and DDBJ is shared during daily updates

12
Exponential growth of the sequence databases

(European Nucleotide Archive, https://www.ebi.ac.uk/ena/about/statistics) 13

Protein sequence origin
 > 95 % of the protein sequences are derived from the translation of
nucleotide sequences

 Sequences experimentally obtained by

direct protein sequencing (~ 5%)

14
Beckman protein/peptide sequencer
HOMEWORK 1
DNA sequencing vs Protein sequencing
a. What is the difference between DNA sequencing and protein sequencing?
b. Why don't we sequence protein like we sequence DNA

15
Protein sequence databases

Main databases
Protein databases

Swiss-Prot TrEMBL
(Europe) (Translated EMBL)
A “waiting list” database for Swiss-Prot
 Reviewed  Unreviewed
 Manually annotated  Computationally annotated
 Records with information extracted from  Records with await full manual
literature & curator-evaluated annotation
computational analysis

UniProt
Established in 2003
16
http://www.uniprot.org
A number of available complete genomes

T7 bacteriophage Escherichia coli Sacchoromyces cerevisae

Published 1983 1998 1996

Nucleotide (bp) 39,937 bp 4,639,221 bp 12,069,252 bp

Genes 59 4293 5800

Caenorhabditis elegans Drosophila melanogaster

completed in 1998 completed in 2000
95,078,296 bp; 19,099 genes 116,117,226 bp; 13,601 genes
17
What was the first mammal to be fully sequenced?

(Compeau and Pevzner, 2015)

18
Homo sapiens (13-year-long, $2.7 billion)
 1st draft completed in 2001
 3,160,079,000 bp; 31,780 genes

19
The Genome Sequencing Era
18 microbial genomes 40 microbial genomes

mouse

First eukaryote genome First higher plant

Arabidopsis First fish
Yeast Fugu
1996 1997 1998 1999 2000 2001 2002 2020

First microbial genome

H. influenzae
E. coli

First multicellular animal

C. elegans malaria:
Fruit fly
mosquito
First mammal and
Homo sapiens parasite

>9000 complete microbial genomes

(NCBI data) 20
EXERCISE BREAK

Exercise 1: Understand Reverse/forwards strands and reverse complementary strand

Example sequence (Forward strand): 5’-GCATGCAT-3’

Question:
• Write down the reverse strand
• Write down the reverse complementary strand

21
EXERCISE BREAK

Exercise 2: Translate DNA sequence into protein

> Example Sequence
ATGACAGGGTGGGAGAGCTTATATAAGGATGCAATCGAGAAGGCAATAAAATCAGTTCCAAAGGTTAAAGGA
GTTCTCCTAGGCTATAACACAAACATAGATGCCATAAAATACCTAGACTCTAAGGATCTCGAAGAGAGAATA
GAGAAAGTCGGTAAGGAGGAAGTATTAAAGTACTCCGAAGAGCTTCCAGAAAAAATCACTTCAATCCCGCAG

Question: Translate the DNA sequence in 3 frames, and determine the

reading frame which contains an open reading frame (ORF).

Suggestion: Bioinformatic Tools

- Sequence Manipulation Suite (SMS) (recommended):
http://annotathon.org/sms2/orf_find.html
- NCBI ORF finder: https://www.ncbi.nlm.nih.gov/orffinder/

22
Translated ORFs

23
3. Biomedical databases search
using NCBI

24
The National Center for Biotechnology
Information (NCBI) (of USA)

https://www.ncbi.nlm.nih.gov/

25
Databases & Tools

Article
Abstracts
MedLine

Taxonomy VAST
Browser 3-D
Genome Structure
Taxonomy
Data MMDB
Viewer
Genomes

Nucleotide Protein
Sequences Sequences

BLAST
BLAST

26
Other Databases

Cancer
Genetic
Chromosome
Variation
Aberration
dbSNP CCAP

Cancer Gene
Gene
Expression
Expression
CGAP
SAGE

Genetic
Protein
Disease
OMIM Swiss Prot
Reference sequence collections at NCBI
(RefSeq Database )

 Collection of unique sequences (One gene, one sequence)

NT_ - genomic contigs NM_ - cDNA/mRNAs NP_ - protein sequences

NR (Non redundant Prot)

= Uniprot = TrEMBL + SwissPROT

 The sequences are deduced (semi) automatically, and later

human-curated
 Status: provisional or reviewed
 Available at: http://www.ncbi.nlm.nih.gov/RefSeq

28
Why NCBI, why not Google?

The NCBI search engine: Entrez

Let's try DNA mismatch repair, to search all NCBI databases and Google Web.

30
GenBank Record Fields
Locus name

Division
Accession
Number
gi Number

Version Number

Medline ID

Protein ID

Protein
sequence

Nucleotide
Sequence
31
FASTA Format
FASTA Definition Line
>gi|193425|gb|M60978.1|MUSGAPDS

gi number
Locus Name

Accession number
DB identifiers
gb GenBank
Emb EMBL
dbj DDBJ
sp Swiss-SROT
pdbProtein Databank
pir PIR
ref RefSeq
40
HOMEWORK 2
Homework 2 (USTH Moodle): Figure out how the genes assigned to each of you are implicated
in cancers and/or immunity (File: Gene List.xlsx)

Requirements: get the following information about each of the 3 genes assigned to you
• Gene symbol, full name, reviewed by RefSeq
• Summary of its function
• Location on the human genome (based on GRCh38)
– e.g. chromosome, start, end, strand
• How this gene is related to cancer
– Get one open-access reference that is most relevant to cancers and/or immunity in your
opinion. Please list the article title, the authors, their institutions, publication year, journal
name.
• Any situations (mutations, over-expression, etc.) of this gene associated with other (non-cancer
and non-immune) diseases
• Extract DNA sequence of these genes and translate the DNA sequences in 3 frames, and
determine the reading frame which contains an open reading frame (ORF).

33
Take home message

 There are a large number of primary databases

 Use appropriate databases
 Know what kind of information to expect

Biological Databases Lec 2,3
No ratings yet
Biological Databases Lec 2,3
49 pages
Bio PPT
No ratings yet
Bio PPT
35 pages
8024 Bio Info
No ratings yet
8024 Bio Info
28 pages
BCH 516-1
No ratings yet
BCH 516-1
32 pages
Intro to Bioinformatics Course
No ratings yet
Intro to Bioinformatics Course
104 pages
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
No ratings yet
FALLSEM2019-20 BIT2001 ETH VL2019201000690 Reference Material I 11-Jul-2019 Unit I New
48 pages
Lecture 5-6 - Databases NR
No ratings yet
Lecture 5-6 - Databases NR
35 pages
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
No ratings yet
Bioinformatics Tools For Nucleotide Sequence Analysis and Database Exploration
75 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
105 pages
Module1 Understanding Bioinformatics
No ratings yet
Module1 Understanding Bioinformatics
28 pages
MSc Biotech Lab Manual
No ratings yet
MSc Biotech Lab Manual
34 pages
Biological Sequence Databases
No ratings yet
Biological Sequence Databases
35 pages
Same Nva Tting
No ratings yet
Same Nva Tting
22 pages
Biological - Databases Class Work 60
No ratings yet
Biological - Databases Class Work 60
60 pages
Biological Databases Overview
No ratings yet
Biological Databases Overview
16 pages
Bioin
No ratings yet
Bioin
34 pages
Bioinformatics: ABE 2007 Kent Koster Group 3
No ratings yet
Bioinformatics: ABE 2007 Kent Koster Group 3
43 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
Databases of NCBI
No ratings yet
Databases of NCBI
13 pages
Databases Class Work
No ratings yet
Databases Class Work
48 pages
Database
No ratings yet
Database
40 pages
Online Biological Databases: A/Prof. Ly Le
No ratings yet
Online Biological Databases: A/Prof. Ly Le
64 pages
Nucleic Acid Databases
No ratings yet
Nucleic Acid Databases
37 pages
Module 2 (Bioinformatics)
No ratings yet
Module 2 (Bioinformatics)
81 pages
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
No ratings yet
Bioinformatics: Farhan Haq, PHD Department of Biosciences Cui
24 pages
#1 L1 BioDatabases
No ratings yet
#1 L1 BioDatabases
89 pages
CMSC 838T - Lecture 9: Bioinformatics Databases
No ratings yet
CMSC 838T - Lecture 9: Bioinformatics Databases
65 pages
Bioinformatics 1
No ratings yet
Bioinformatics 1
37 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
Course BIO 4213.courseoutline
No ratings yet
Course BIO 4213.courseoutline
5 pages
Unit 1
No ratings yet
Unit 1
24 pages
Bioinformatics
No ratings yet
Bioinformatics
47 pages
Sequence and Structure Retrieval
No ratings yet
Sequence and Structure Retrieval
9 pages
Bioinfo Lecture 2
No ratings yet
Bioinfo Lecture 2
29 pages
Introduction To Databases
No ratings yet
Introduction To Databases
21 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
CH12
No ratings yet
CH12
8 pages
Bioinformatics Database Guide
No ratings yet
Bioinformatics Database Guide
19 pages
Lecture 3
No ratings yet
Lecture 3
55 pages
Bioinformatics Lab Assignment Group 3
No ratings yet
Bioinformatics Lab Assignment Group 3
7 pages
Bioinformatics for Researchers
No ratings yet
Bioinformatics for Researchers
23 pages
Lecture 2
No ratings yet
Lecture 2
24 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
19 pages
Biologicaldatabase 190402034501
No ratings yet
Biologicaldatabase 190402034501
26 pages
PDP
No ratings yet
PDP
2 pages
Bioinformatics Tools & Resources Guide
No ratings yet
Bioinformatics Tools & Resources Guide
283 pages
Lecture Bioinfo Databases
No ratings yet
Lecture Bioinfo Databases
27 pages
Class12 Biological Database
No ratings yet
Class12 Biological Database
23 pages
Biological Databases ODL
No ratings yet
Biological Databases ODL
31 pages
Biological Database Overview
No ratings yet
Biological Database Overview
31 pages
Bioinformatics Lecture1
No ratings yet
Bioinformatics Lecture1
28 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
Bioinformatics Lecture Notes Database
No ratings yet
Bioinformatics Lecture Notes Database
28 pages
Databases - Final
No ratings yet
Databases - Final
50 pages
Genomics
No ratings yet
Genomics
24 pages
BIF401 Midterm Short Notes
No ratings yet
BIF401 Midterm Short Notes
45 pages
Tics - A Brief Introduction
No ratings yet
Tics - A Brief Introduction
4 pages
Lecture 5
No ratings yet
Lecture 5
26 pages
Lecture 4
No ratings yet
Lecture 4
21 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
Lecture 2
No ratings yet
Lecture 2
40 pages
4-Excitable Cell 2024
No ratings yet
4-Excitable Cell 2024
23 pages
Physiology of Body Fluid Dynamics
No ratings yet
Physiology of Body Fluid Dynamics
36 pages
1-Basic Human and Animal Anatomy 2024
No ratings yet
1-Basic Human and Animal Anatomy 2024
34 pages
Isolation and Characterization of Nucleic Acids From An Onion (Allium Cepa)
No ratings yet
Isolation and Characterization of Nucleic Acids From An Onion (Allium Cepa)
3 pages
Zoo Dse Bioinfo
No ratings yet
Zoo Dse Bioinfo
3 pages
DSS Takara Price List
No ratings yet
DSS Takara Price List
324 pages
RNA-quality Control by The Exosome: Reviews
No ratings yet
RNA-quality Control by The Exosome: Reviews
11 pages
Cell Biology Lab: DNA Extraction
No ratings yet
Cell Biology Lab: DNA Extraction
3 pages
Gene Silencing Technology
No ratings yet
Gene Silencing Technology
3 pages
Biochemistry - Lehninger - 0093
No ratings yet
Biochemistry - Lehninger - 0093
1 page
Eukaryotic and Prokaryotic Nucleus Overview
No ratings yet
Eukaryotic and Prokaryotic Nucleus Overview
2 pages
DNA Damage and Repair Mechanisms
No ratings yet
DNA Damage and Repair Mechanisms
62 pages
ANP 312-Intro. To Genetics
No ratings yet
ANP 312-Intro. To Genetics
97 pages
Blood Biochemistry 2 (2021)
No ratings yet
Blood Biochemistry 2 (2021)
29 pages
Mechanisms of Ageing and Development: Martin S. Denzel, Louis R. Lapierre, Hildegard I.D. Mack
No ratings yet
Mechanisms of Ageing and Development: Martin S. Denzel, Louis R. Lapierre, Hildegard I.D. Mack
18 pages
17 Kazanim Test
No ratings yet
17 Kazanim Test
2 pages
Overview of Next Generation Sequencing Technologies
No ratings yet
Overview of Next Generation Sequencing Technologies
12 pages
GCSP 2012 7
No ratings yet
GCSP 2012 7
18 pages
Protein Structure: Predictive Methods and Experimental Methodologies
No ratings yet
Protein Structure: Predictive Methods and Experimental Methodologies
33 pages
Prion Diseases: Primary Structures
No ratings yet
Prion Diseases: Primary Structures
4 pages
Protein Synthesis Project
No ratings yet
Protein Synthesis Project
2 pages
Artificial Chromosomes Yac Bac Pac Mac Artificial
No ratings yet
Artificial Chromosomes Yac Bac Pac Mac Artificial
6 pages
Eukaryotic Chromosome Guide
No ratings yet
Eukaryotic Chromosome Guide
2 pages
K1. Protein Penyusun Kulit (Elvira Yunita, S.Si, M.Biomed)
No ratings yet
K1. Protein Penyusun Kulit (Elvira Yunita, S.Si, M.Biomed)
48 pages
Literature Review
No ratings yet
Literature Review
1 page
A Guide To Using STITCHER For Overlapping Assembly
No ratings yet
A Guide To Using STITCHER For Overlapping Assembly
9 pages
Active Site of Enzymes
100% (1)
Active Site of Enzymes
11 pages
Biological Stain Analysis: DNA: Fourth Edition
No ratings yet
Biological Stain Analysis: DNA: Fourth Edition
88 pages
Protein Database
No ratings yet
Protein Database
8 pages
Quizlet Chapter 16
No ratings yet
Quizlet Chapter 16
10 pages
Enzymes and Proteins in Dna Replication
100% (2)
Enzymes and Proteins in Dna Replication
39 pages
Dna and Rna Notes
No ratings yet
Dna and Rna Notes
5 pages
AO1 Quiz DNA, Cell Cycle
No ratings yet
AO1 Quiz DNA, Cell Cycle
3 pages

Lecture 1

Uploaded by

Lecture 1

Uploaded by

Bioinformatics - Session 1

Course Provider: PhD Tam Tran

Bachelor in PMAB & MST– Year 2

Prof. Chi-Ying Huang

Dr. Tam Tran

2. Homework (30%) – submit your homework via USTH Moodle before

3. Exam (60%) - Multiple Choice Exams

1. Attendance (20%) (Note: 3 absences or late/5 classes = 0)

2. Homework (80%) – submit your homework via USTH Moodle before

 All handouts, assignments, lecture slides, and announcements are

3. Biomedical databases search using NCBI

Bioinformatics (NCBI definition)

Basic molecular biology Clinical/medical diagnostics

Development of pharmaceuticals Agricultural biotechnology

Sequence bioinformatics Structural bioinformatics

Three databases that accept direct submission

(European Nucleotide Archive, https://www.ebi.ac.uk/ena/about/statistics) 13

 Sequences experimentally obtained by

T7 bacteriophage Escherichia coli Sacchoromyces cerevisae

Nucleotide (bp) 39,937 bp 4,639,221 bp 12,069,252 bp

Genes 59 4293 5800

Caenorhabditis elegans Drosophila melanogaster

(Compeau and Pevzner, 2015)

First eukaryote genome First higher plant

First microbial genome

First multicellular animal

>9000 complete microbial genomes

Exercise 1: Understand Reverse/forwards strands and reverse complementary strand

Exercise 2: Translate DNA sequence into protein

Question: Translate the DNA sequence in 3 frames, and determine the

Suggestion: Bioinformatic Tools

 Collection of unique sequences (One gene, one sequence)

NT_ - genomic contigs NM_ - cDNA/mRNAs NP_ - protein sequences

NR (Non redundant Prot)

 The sequences are deduced (semi) automatically, and later

The NCBI search engine: Entrez

 There are a large number of primary databases

You might also like