Lecture 2

Uploaded by

x95id51hb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views36 pages

Lecture 2

Uploaded by

x95id51hb

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

LECTURE 2

Selected Topics in
Computer
Science
CS409 Introduction To
bioinformatics
DR. Ashraf Hendam
OUTLINE
• Genome Sequencing
• Genetic variations
• Biological Data File Format
Genome Sequencing
Sequencing by Reference
Reference assembly maps reads to a reference genome by identifying
reads with similar nucleotides to the reference.
Genome Sequencing
De novo sequencing
• Refers to sequencing a novel genome where there is no reference
sequence available for alignment.
• Sequence reads are assembled as contigs, and the coverage
quality of de novo sequence data depends on the size and
continuity of the contigs (ie, the number of gaps in the data).
Genetic variations
Causes of variations
1. Mistakes in DNA replication
2. Environmental agents (radiation, chemical agents)
3. Transposable elements (transposons)
A part of DNA is moved or copied to another location in genome
4. Horizontal transfer of DNA
• Organism obtains genetic material from another organism that
is not its parent
• Utilized in genetic engineering

Back
Genetic variations Cont.
Types of variations
1. SNP (Single Nucleotide Polymorphisms)
2. Indels (Insertion-Deletion)
3. Inversion
4. Duplication

Back
Genetic variations Cont.
SNP (Single Nucleotide Polymorphisms)

Ref
G A C T T C G A T C A
Sample
G A C G T C G A T C A

Back
Non-synonymous Synonymous SNP
Genetic variations Cont. SNP (nsSNP) (sSNP)

Ref G A C T T C G A T C A G DFDQ
Sample G A C G T C G A T C A A DVDQ

Back
Genetic variations Cont.
Indels (Insertion-Deletion)
Insertion
Ref
G A C T - - - - T C G
Sample
G A C T C G A T T C G
Genetic variations Cont.
Deletion

Ref
G A C T T C G A T C A
Sample
G A C - - C G A T C A
Genetic variations Cont.
Inversion

REF
G A C T T C G A T C A
Sample
G A C G C T T A T C A

Back
Genetic variations Cont.
Duplication

REF
G A C G T C G A T C A
ReSeq
G A C G T C G T C C A

Back
Biological Data File Format
• FASTA
• FASTQ
• SAM
• BAM
• VCF
• GFF
• PdB Back
Biological Data File Format
FASTA
• File extensions : file.fa, file.fasta, file.fna
• Fasta format is a simple way of representing nucleotide or amino acid sequences of
proteins.
• This is a very basic format with two minimum lines. First line referred as comment line
starts with ‘>’ and gives basic information about sequence.
• After comment line, sequence of nucleic acid or protein is included in standard one
letter code
• >XR_ 002086427.1 Candida albicans SC5314 uncharacterized ncRNA (SCR1), ncRNA
TGGCTGTGATGGCTTTTAGCGGAAGCGCGCTGTTCGCGTACCTGCTGTTTGTTGAAAATTTAA
GAGCAAAGTGTCCGGCTCGATCCCTGCGAATTGAATTCTGAACGCTAGAGTAATCAGTGTCTT
TCAAGTTCTGGTAATGTTTAGCATAACCACTGGAGGGAAGCAATTCAGCACAGTAATGCTAAT
CGTGGTGGAGGCGAATCCGGATGGCACCTTGTTTGTTGATAAATAGTGCGGTATCTAGTGTTG
CAACTCTATTTTT
• >7S4F1|Chain A|Tyrosine-protein phosphatase non-receptor type 1|Homo sapiens
MEMEKEFEQIDKSGSWAAIYQDIRHEASDFPCRVAKLPKNKNRNRYRDVSPFDHSRIK
LHQEDNDYINASLIKMEEAQRSYILTQGPLPNTCGHFWEMVWEQKSRGVVMLNRV
MEKGSLKCAQYWPQKEEKE
Back
Biological Data File Format
FASTQ
1. File extensions : file.fastq, file.sanfastq, file.fq
2. Fastq format was developed by Sanger institute in order to group together sequence
and its quality scores (Q: phred quality score). In fastq files each entry is associated with
4 lines.
@K00188:2:N:0:CTTGTA ATAATAGGATCCCTTTTCCTGGAGCTGCCTTTAGGTA
+
AAAFFJJJJJJJJJJJJJJJJJFJJFJJJJJFJJJJJ
1. Line 1 begins with a ‘@‘ character and is a sequence identifier and an optional
description.
2. Line 2 Sequence in standard one letter code.
3. Line 3 begins with a ‘+‘ character and is optionally followed by the same sequence
identifier (and any additional description) again.
4. Line 4 encodes the quality values for the sequence in Line 2, and must contain the
same number of symbols as letters in the sequence.

Back
Biological Data File Format
• File extensions : file.sam
• SAM (Sequence Alignment/Map format) Format is a text format for
storing sequence data in a series of tab delimited ASCII columns.
• SAM format files are generated following mapping of the reads to
reference sequence.
• Header It is TAB-delimited text format with header and a an
alignment section.
 section is optional, Header lines start with ‘@’
 Each alignment line has 11 mandatory fields for essential
alignment information such as mapping position, and
variable number of optional fields

Back
Biological Data File Format
Biological Data File Format
Biological Data File Format
BAM
• File extensions : file.bam
• A BAM (Binary Alignment/Map) file is the compressed
binary version of the Sequence Alignment/Map (SAM), a
compact and indexable representation of nucleotide
sequence alignments.
• Data between SAM and BAM is exactly same.
• Being Binary BAM files are small in size and ideal to store
alignment files.

Back
Biological Data File Format
BAM

Back
Biological Data File Format
VCF (Variant Calling Format/File)
• File extensions : file.vcf
• VCF is a text file format with a header (information VCF
version, sample etc) and data lines constitute the body of file.
• VCF file contains data of genetic variation including SNPs
and Indels.

Back
Biological Data File Format
VCF (SNPs)
Biological Data File Format
VCF(Indels)
Biological Data File Format
NW_008246507.1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
C A T T ACGT A A T T C C G C T T C C G G C A T C T G G C T C A G T T C C G C

VCF
Biological Data File Format
GFF(General Feature Format or Gene Finding
Format)
• File extensions : file.gff2, file. gff3, file.gff
• It has first 8 fields like GFF2 but differs in field 9 in
assigning attributes.
• GFF file is used to annotate VCF files (SNPs and
Indels)

Back
Biological Data File Format
GFF

Back
Biological Data File Format
NW_008246507.1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
CA T T A CGT A A T T C C G C T T C C G G C A T C T G G C T C A G T T C C G C

GFF
Biological Data File Format
GFF

VCF
Biological Data File Format
Protein Data Bank (PDB)
• Is a standard format for files containing atomic coordinates
of proteins.
• Structures deposited in the Protein Data Bank at the Research
Collaboratory for Structural Bioinformatics (RCSB) are
written in this standardized format.
• Several methods are currently used to determine the
structure of a protein, including X-ray crystallography, NMR
spectroscopy, and electron microscopy.

Back
Biological Data File Format
Protein Data Bank (PDB)
Biological Data File Format
Protein Data Bank (PDB)
Biological Data File Format
Protein Data Bank (PDB)
ATOM is the atoms of the protein
Biological Data File Format
Protein Data Bank (PDB)
HETATM is the atoms of drugs not a part of the protein
Biological Data File Format
Molecular visualization software of PDB

Back
Biological Data File Format
Protein Data Bank (PDB)

Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
No ratings yet
Bioinformatics Tools: Stuart M. Brown, PH.D Dept of Cell Biology NYU School of Medicine
50 pages
Blast - Oreilly
No ratings yet
Blast - Oreilly
3 pages
NGS File Formats & Tools Guide
No ratings yet
NGS File Formats & Tools Guide
32 pages
Unit-5 Bioinformatics
No ratings yet
Unit-5 Bioinformatics
13 pages
Formats
No ratings yet
Formats
19 pages
COMP90016 2023 06 Data Sources
No ratings yet
COMP90016 2023 06 Data Sources
64 pages
IBT Practical Assignment MEMO Genomics S1 FGuerfali
No ratings yet
IBT Practical Assignment MEMO Genomics S1 FGuerfali
4 pages
Bioinformatics for Genome Mapping
No ratings yet
Bioinformatics for Genome Mapping
49 pages
AAB 4412 - Lecture Session FIVE
No ratings yet
AAB 4412 - Lecture Session FIVE
11 pages
02 00 PMBIO Module02 Inputs
No ratings yet
02 00 PMBIO Module02 Inputs
32 pages
Bioinformatics Code & Format Guide
No ratings yet
Bioinformatics Code & Format Guide
53 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
RIP Tutorials Bioinformatics
No ratings yet
RIP Tutorials Bioinformatics
19 pages
Bioinformatics Tools & Databases
No ratings yet
Bioinformatics Tools & Databases
50 pages
Computational Biology B.Tech - Biotech (Vith Semester)
No ratings yet
Computational Biology B.Tech - Biotech (Vith Semester)
34 pages
Genetic Data Storage Methods
No ratings yet
Genetic Data Storage Methods
19 pages
02-B-Sequence Presentation and File Formats
No ratings yet
02-B-Sequence Presentation and File Formats
43 pages
Genome Sequence Assembly Guide
No ratings yet
Genome Sequence Assembly Guide
92 pages
Lecture1 BIOF242 Shuvadeep
No ratings yet
Lecture1 BIOF242 Shuvadeep
38 pages
02 Sequence Alignment
No ratings yet
02 Sequence Alignment
43 pages
Bioinformatics: Intended Learning Outcomes
No ratings yet
Bioinformatics: Intended Learning Outcomes
9 pages
Bioinformatics
No ratings yet
Bioinformatics
5 pages
Analysis of RNA-Seq Data
No ratings yet
Analysis of RNA-Seq Data
71 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
66 pages
Biological Databeses - Details
No ratings yet
Biological Databeses - Details
47 pages
2nd Lec Student Copy - 2
No ratings yet
2nd Lec Student Copy - 2
19 pages
Module 1 - Session 3 - Part 2
No ratings yet
Module 1 - Session 3 - Part 2
36 pages
Lecture Bioinfo Databases
No ratings yet
Lecture Bioinfo Databases
27 pages
CUBT401 - 4 - Sequence and Genome Annotation
No ratings yet
CUBT401 - 4 - Sequence and Genome Annotation
66 pages
1 Intro Struct Bioinfo
No ratings yet
1 Intro Struct Bioinfo
69 pages
Bioinformatics File Formats Guide
No ratings yet
Bioinformatics File Formats Guide
22 pages
WGS Tools Guide for Beginners
No ratings yet
WGS Tools Guide for Beginners
46 pages
Bioninformaticas Lecture - 1
No ratings yet
Bioninformaticas Lecture - 1
33 pages
Bioinformatics
No ratings yet
Bioinformatics
55 pages
4 Bioinformaticsdatabases
No ratings yet
4 Bioinformaticsdatabases
71 pages
Same Nva Tting
No ratings yet
Same Nva Tting
22 pages
Practical Lab Exercise For Intro Bioinf II
No ratings yet
Practical Lab Exercise For Intro Bioinf II
29 pages
Selected Topic in Cs 1
No ratings yet
Selected Topic in Cs 1
53 pages
Sec1 Introduction To Bioinformatics
No ratings yet
Sec1 Introduction To Bioinformatics
20 pages
GlOsario Bioinformatica
No ratings yet
GlOsario Bioinformatica
5 pages
Anotacion de Genomas
No ratings yet
Anotacion de Genomas
84 pages
Lecture 3 Database
No ratings yet
Lecture 3 Database
81 pages
Lesson 3.1 DB-Updated
No ratings yet
Lesson 3.1 DB-Updated
50 pages
Introduction
No ratings yet
Introduction
13 pages
Proteins Bioinfo Latest
No ratings yet
Proteins Bioinfo Latest
45 pages
Bioinformatics Final
No ratings yet
Bioinformatics Final
18 pages
Bioinformatics 2015
No ratings yet
Bioinformatics 2015
269 pages
Bioinformatics Lecture Summary
No ratings yet
Bioinformatics Lecture Summary
15 pages
IBT DNA Seq Analysis
No ratings yet
IBT DNA Seq Analysis
38 pages
Sequence Analysis Primer 1st Edition ISBN 0195098749, 9780195098747 Full Text Download
No ratings yet
Sequence Analysis Primer 1st Edition ISBN 0195098749, 9780195098747 Full Text Download
16 pages
Genomics 1
No ratings yet
Genomics 1
47 pages
Lecture 01
No ratings yet
Lecture 01
20 pages
15GN402L Final Bioinformatics Lab Manual
No ratings yet
15GN402L Final Bioinformatics Lab Manual
68 pages
L01 Solved
No ratings yet
L01 Solved
15 pages
Introduction To Bioinformatics: High-Throughput Biological Data and Evolution
No ratings yet
Introduction To Bioinformatics: High-Throughput Biological Data and Evolution
39 pages
WunbeiJoshua BioinformaticsAssignment
No ratings yet
WunbeiJoshua BioinformaticsAssignment
8 pages
Bioinformatics Session1
No ratings yet
Bioinformatics Session1
35 pages
Bioinformatics Database Basics
No ratings yet
Bioinformatics Database Basics
18 pages
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
No ratings yet
Gene Identification - I: Shivani Chandra Birla Institute of Scientific Research
35 pages
Akashic Records: Learning & Ethics
No ratings yet
Akashic Records: Learning & Ethics
20 pages
Introduction To Systems Biology Lecture 1 - Part B - 1 Iyengar
No ratings yet
Introduction To Systems Biology Lecture 1 - Part B - 1 Iyengar
8 pages
BioInformatics Assignment 1
No ratings yet
BioInformatics Assignment 1
7 pages
B.Sc. Bioinformatics Results April 2016
No ratings yet
B.Sc. Bioinformatics Results April 2016
1 page
Academic CV for Computer Science
No ratings yet
Academic CV for Computer Science
3 pages
Biocomputing: Matthew N. O. Sadiku Matthew N. O. Sadiku, Nana K. Ampah, Sarhan M. Musa Sarhan M. Musa
No ratings yet
Biocomputing: Matthew N. O. Sadiku Matthew N. O. Sadiku, Nana K. Ampah, Sarhan M. Musa Sarhan M. Musa
2 pages
2020-2021 Graduate Calendar: Description Date
No ratings yet
2020-2021 Graduate Calendar: Description Date
8 pages
Number (Old) Title Old Course Area (Before July 2019) New Course Area (After June 2019)
No ratings yet
Number (Old) Title Old Course Area (Before July 2019) New Course Area (After June 2019)
5 pages
Bigdata
No ratings yet
Bigdata
9 pages
E1 Applications On High Performance Computing-5
No ratings yet
E1 Applications On High Performance Computing-5
7 pages
From Large Language Models To Multimodal AI: A Scoping Review On The Potential of Generative AI in Medicine
No ratings yet
From Large Language Models To Multimodal AI: A Scoping Review On The Potential of Generative AI in Medicine
40 pages
BLAST: Fast Sequence Search Tool
No ratings yet
BLAST: Fast Sequence Search Tool
6 pages
Program CS BS
No ratings yet
Program CS BS
21 pages
The Information Age
100% (3)
The Information Age
38 pages
An Overview of Phoneutria Nigriventer Spider Venom
No ratings yet
An Overview of Phoneutria Nigriventer Spider Venom
29 pages
Otago677893 - Information Statement
No ratings yet
Otago677893 - Information Statement
2 pages
Diaporthe 1 EF1-728F
No ratings yet
Diaporthe 1 EF1-728F
9 pages
Jamshedpur Co Operative College
No ratings yet
Jamshedpur Co Operative College
2 pages
Lab 1 - Introduction and Protocol
No ratings yet
Lab 1 - Introduction and Protocol
28 pages
Next Generation Sequencing Analysis Lecture 01.
No ratings yet
Next Generation Sequencing Analysis Lecture 01.
16 pages
Science 2023-24
No ratings yet
Science 2023-24
100 pages
Interview Prep
No ratings yet
Interview Prep
2 pages
Computational Method For Predicting Bacterial Growth With Genetic Programming
100% (2)
Computational Method For Predicting Bacterial Growth With Genetic Programming
13 pages
Biotechnology You
No ratings yet
Biotechnology You
16 pages
Deep Learning Case Study
No ratings yet
Deep Learning Case Study
7 pages
Phi Blast
No ratings yet
Phi Blast
10 pages
Introduction to Bioinformatics
No ratings yet
Introduction to Bioinformatics
14 pages
(Ebook PDF) Introduction To Bioinformatics 5th Edition PDF Download
100% (4)
(Ebook PDF) Introduction To Bioinformatics 5th Edition PDF Download
50 pages

Lecture 2

Uploaded by

Lecture 2

Uploaded by

LECTURE 2

You might also like