Sequence Search Algorithms

Uploaded by

carucast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views3 pages

Sequence Search Algorithms

Uploaded by

carucast

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

HIGH THROUGHPUT SEQUENCING

Sequence mapping algorithms (1 with many)

1. Read mapping
• Sequencing alignment is the next step a2er sequencing, you must know where the reads sequenced are located
respect to a reference genome.

• Mapping hundreds of millions of reads back to the reference genome is CPU and RMA intensive and slow.
• Most mappers allow approximately 2 mismatches within ﬁrst 30bp (428 could sJll uniquely idenJfy most 30bp
sequences in a 3GB genome), slower when allowing INDELS.

2. Seed

• Break database sequences (FASTQ) into k-mer words (seed and hash their locaJons) and hash their locaJons to
speed later searches.
• K-mer: substring of length k. For example, a 5-mer index are k-mer sequences of 5bp length.
• You can T-index query sequences respect a template or reference genome referring to where these k-mers
appear in the genome of reference.

Note: 0 is posiJon 1 of the genomes of reference.

3. Blast algorithm

• Seed-and-extend paradigm.
• For each k-mer in a query, find the possible database k-mers that matches well with it.
• Only words with ≥ T-index cutoff are kept.
• Steps:
1. For each DB sequence with a high scoring word try to extend it in both ends:
§ High HSP (high-scoring Segment Pairs): for this long sequence there is a match region.
2. Keep only staJsJcally significant HSPs (E-value):
§ Based on the scores of aligning 2 random sequences, that mean that in this database the likelihood
you will some kind of match like this as point of reference for the query and references sequences.
3. Use Smith-Waterman (local alignment) algorithm to join the HSPs and opJmal alignment.

Sequence A is the database sequence (reference) and sequence B is the query sequence. The black thigh
lines are the HSPs and using the Smith-Waterman algorithm you can try to get these signiﬁcant HSPs
together and generate the ﬁnal alignment.

4. Suffix tree
• A tree of all the suffixes of the reference sequence:
§ “Banana” has the suffixes: BANANA, ANANA, NANA, ANA, NA, A.
• Used in alignment tools such as MUMmer.
• Order(n) Jme to build:
§ n=genome length.
• Order(m) Jme to search:
§ m=query length.
• Genome index is big, such as human genome
(50GB) the tree would be huge.
5. Suffix array

• The ith entry corresponds to the ith smallest suﬃx. In this example BANANA is the entry and has the smallest
suﬃx (0)
• Used in alignment tools such as STAR.
• Order(n) Jme to build:
§ n=genome length.
• Order(mlogn) Jme to search
§ Binary search.
§ m=query length.
• Index size is moderate, around 15GB.
• It is used in RNA sequence.

Algorithms On String Trees and Sequences
No ratings yet
Algorithms On String Trees and Sequences
326 pages
Lifting Gear For Roller Guide: Note!
No ratings yet
Lifting Gear For Roller Guide: Note!
4 pages
Mathematics For Electrical Science and Physical Science, M-1, S2
No ratings yet
Mathematics For Electrical Science and Physical Science, M-1, S2
4 pages
Algorithms On Strings Trees and Sequence PDF
No ratings yet
Algorithms On Strings Trees and Sequence PDF
326 pages
Parental Personality and Parenting Style
No ratings yet
Parental Personality and Parenting Style
13 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
89 pages
Altas Copco FD 230 PDF
No ratings yet
Altas Copco FD 230 PDF
16 pages
Working With Files: A Presentation On
No ratings yet
Working With Files: A Presentation On
27 pages
(People and Ideas) Daniel C. Tosteson (Auth.), Daniel C. Tosteson (Eds.) - Membrane Transport - People and Ideas (1989, Springer New York)
100% (1)
(People and Ideas) Daniel C. Tosteson (Auth.), Daniel C. Tosteson (Eds.) - Membrane Transport - People and Ideas (1989, Springer New York)
410 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Lecture 6
No ratings yet
Lecture 6
31 pages
Bioinformatics Basics PDF
No ratings yet
Bioinformatics Basics PDF
10 pages
Algorithm Design and Scoring Matrices PDF
No ratings yet
Algorithm Design and Scoring Matrices PDF
31 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Bioinformatics Sequence Alignment
No ratings yet
Bioinformatics Sequence Alignment
32 pages
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
No ratings yet
Sequence Alignment Algorithms: DEKM Book Notes From Dr. Bino John and Dr. Takis Benos
53 pages
Skewb Puzzle Solving Guide
No ratings yet
Skewb Puzzle Solving Guide
12 pages
Sequence Alignments: Felix Sappelt Irina Wagner
100% (1)
Sequence Alignments: Felix Sappelt Irina Wagner
34 pages
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
No ratings yet
Genoogle: An Indexed and Parallelized Search Engine For Similar DNA Sequences
18 pages
Sequence Alignment & BLAST Guide
No ratings yet
Sequence Alignment & BLAST Guide
37 pages
ONGC Spce Tube Product
No ratings yet
ONGC Spce Tube Product
2 pages
Lecture 4
No ratings yet
Lecture 4
106 pages
Blast
No ratings yet
Blast
18 pages
Présentation Ekin en
No ratings yet
Présentation Ekin en
40 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Sequence Alignment and Searching
No ratings yet
Sequence Alignment and Searching
54 pages
Advanced Sequence Alignment Guide
No ratings yet
Advanced Sequence Alignment Guide
83 pages
Lecture 4: Blast: Ly Le, PHD
No ratings yet
Lecture 4: Blast: Ly Le, PHD
60 pages
Pinto - pm2 - Session 4 - Shared Slides
No ratings yet
Pinto - pm2 - Session 4 - Shared Slides
78 pages
Ps 6 - Material Balance With Chemical Reactions
No ratings yet
Ps 6 - Material Balance With Chemical Reactions
4 pages
Bio 2
No ratings yet
Bio 2
39 pages
Module 3 CSE3069 (Bioinformatics)
No ratings yet
Module 3 CSE3069 (Bioinformatics)
57 pages
Homer: Mapping Reads To The Genome
No ratings yet
Homer: Mapping Reads To The Genome
5 pages
Gene Sequence Analysis Guide
No ratings yet
Gene Sequence Analysis Guide
14 pages
Sequence Analysis - Alignment
No ratings yet
Sequence Analysis - Alignment
57 pages
Convection Heat Transfer
No ratings yet
Convection Heat Transfer
60 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
61 pages
TENARIS Pipes-For-Civil-Industrial-Installation
No ratings yet
TENARIS Pipes-For-Civil-Industrial-Installation
28 pages
Algorithms On Strings Trees and Sequences
100% (1)
Algorithms On Strings Trees and Sequences
163 pages
Multiple Sequence Alignment Part 1
No ratings yet
Multiple Sequence Alignment Part 1
64 pages
Genome Sequence Assembly Guide
No ratings yet
Genome Sequence Assembly Guide
92 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Nimbus VTOL Manual 180306
100% (1)
Nimbus VTOL Manual 180306
11 pages
B - Précontrainte - Prestressing C & F - EN
No ratings yet
B - Précontrainte - Prestressing C & F - EN
36 pages
Edexcel IGCSE Chemistry 4CH0 Section B7
No ratings yet
Edexcel IGCSE Chemistry 4CH0 Section B7
2 pages
Engineering Drawing PDF
No ratings yet
Engineering Drawing PDF
6 pages
H.S.C Result Distribution 2020
No ratings yet
H.S.C Result Distribution 2020
3 pages
Basics of Bioinformatics
100% (7)
Basics of Bioinformatics
99 pages
Chemical Transducer
100% (1)
Chemical Transducer
15 pages
Diploma - Practical
No ratings yet
Diploma - Practical
11 pages
Industrial Packing Design Sheet
No ratings yet
Industrial Packing Design Sheet
24 pages
UART Interface Design & UVM Verification
No ratings yet
UART Interface Design & UVM Verification
4 pages
Lecture 28 Unit6 1
No ratings yet
Lecture 28 Unit6 1
16 pages
Improvise Academy: Subject: Physics Class: XII Full Marks: 75
No ratings yet
Improvise Academy: Subject: Physics Class: XII Full Marks: 75
2 pages
Bioinformatics Lab 2 (Evelyn)
No ratings yet
Bioinformatics Lab 2 (Evelyn)
9 pages
Regression Analysis Q&A Guide
No ratings yet
Regression Analysis Q&A Guide
2 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
19 pages
Brief Bioinform-2010-Li-473-83
No ratings yet
Brief Bioinform-2010-Li-473-83
11 pages
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
100% (3)
Application in Establishing Epidemiology and Variability: Genome & Protein " Sequence Analysis Programs"
23 pages
Bioinformatics Lab 2
No ratings yet
Bioinformatics Lab 2
9 pages
IT SKILL LAB KMBN MBA 1st Sem
No ratings yet
IT SKILL LAB KMBN MBA 1st Sem
23 pages
Unit Iv - Blast
No ratings yet
Unit Iv - Blast
21 pages
Aging Performance and Moisture Solubility of Veg. Oils For Power Trfs.
No ratings yet
Aging Performance and Moisture Solubility of Veg. Oils For Power Trfs.
6 pages
Multiple Sequence Alignment
No ratings yet
Multiple Sequence Alignment
18 pages
Multiple Alignment
No ratings yet
Multiple Alignment
28 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Tema 10 Leukocyte Migration
No ratings yet
Tema 10 Leukocyte Migration
36 pages
Midas Gen: 1. Design Information
No ratings yet
Midas Gen: 1. Design Information
1 page
BMB 822 - Bioinformatics and Computing - Lecture Notes
No ratings yet
BMB 822 - Bioinformatics and Computing - Lecture Notes
94 pages
Alignment Methods
No ratings yet
Alignment Methods
33 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Unit I Algorithms
No ratings yet
Unit I Algorithms
42 pages
BI Assignment 1
No ratings yet
BI Assignment 1
6 pages
Bioinformatics
No ratings yet
Bioinformatics
22 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
Early Sequence Aligment
No ratings yet
Early Sequence Aligment
14 pages
Lecture 3
No ratings yet
Lecture 3
46 pages
GSEA
No ratings yet
GSEA
7 pages
Distribution
No ratings yet
Distribution
7 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Examen Innovation I
No ratings yet
Examen Innovation I
6 pages
The Effects of Instrument in Measurements
No ratings yet
The Effects of Instrument in Measurements
18 pages
Pharmaceuticals 18 00217
No ratings yet
Pharmaceuticals 18 00217
25 pages
Your Paper: You January 3, 2025
No ratings yet
Your Paper: You January 3, 2025
3 pages
Class12 CS Practical File Slides Guidelines
No ratings yet
Class12 CS Practical File Slides Guidelines
12 pages
Electromagnetism Research Paper
No ratings yet
Electromagnetism Research Paper
3 pages
BWASW
No ratings yet
BWASW
7 pages
Data Systems Life Sciences Wk2 Assignment
No ratings yet
Data Systems Life Sciences Wk2 Assignment
6 pages

Sequence Search Algorithms

Uploaded by

Sequence Search Algorithms

Uploaded by

HIGH THROUGHPUT SEQUENCING

Sequence mapping algorithms (1 with many)

Note: 0 is posiJon 1 of the genomes of reference.

You might also like