0% found this document useful (0 votes)

107 views15 pages

Global Alignment: Ben Langmead

This document discusses global alignment and generalizing edit distance. It introduces the idea of assigning different costs or penalties to different sequence differences, like transitions versus transversions or gaps versus substitutions. It presents an implementation of global alignment using dynamic programming that allows a custom penalty function. The algorithm runs in O(mn) time and space and traceback returns the optimal alignment in O(m+n) time. Scoring functions aim to reflect expected mutational events and biological interchangeability, though simple functions are often used for computational efficiency.

Uploaded by

mohit mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views15 pages

Global Alignment: Ben Langmead

Uploaded by

mohit mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Global Alignment

Ben Langmead

Department of Computer Science

Please sign guestbook (www.langmead-lab.org/teaching-materials)

to tell me brieﬂy how you are using the slides. For original Keynote
ﬁles, email me ([email protected]).
Generalizing edit distance

What if it doesn’t make sense for every edit to cost 1?

If you compare two human genomes, you see some kinds of

sequence diﬀerences more often than others
Generalizing edit distance

Transitions are A ↔ G
and C ↔ T changes

Transversions are A C,
A T, C G, G T

For random mutations,

transitions should be half as
frequent as transversions...

...but if you compare two humans, transition

to transversion ratio (ti/tv) is ~2.1
Generalizing edit distance

GGGTAGCGGGTTTAAC
||||| ||||||||||
GGGTAACGGGTTTAAC
Human substitution rate ≈ 1 in 1,000

GGGTAGCGGGTTTAAC
||||| |||||||||
GGGTA--GGGTTTAAC
Small-gap rate is ≈ 1 in 3,000

Wanted: keep basic edit distance idea and algorithm, but give
diﬀerent weights to diﬀerent events according to likelihood
Penalty function

A C G T -
A 0 4 2 4 8 2 Transitions (A G, C T)
C 4 0 4 2 8 4 Transversions
G 2 4 0 4 8
8 Gaps
T 4 2 4 0 8
- 8 8 8 8
Global alignment
Pj 1 Pi 1
Let D[0, j] = k=0 s( , y[k]), and let D[i, 0] = k=0 s(x[k], )
8
< D[i 1, j] + s(x[i 1], )
Otherwise, let D[i, j] = min D[i, j 1] + s( , y[j 1])
:
D[i 1, j 1] + s(x[i 1], y[j 1])
s(a, b) assigns a cost to a particular gap or substitution

A C G T -
A 0 4 2 4 8 2 Transitions (A G, C T)

C 4 0 4 2 8
s(a, b) : 4 Transversions (everything else)
G 2 4 0 4 8
T 4 2 4 0 8 8 Gaps
- 8 8 8 8
Global alignment: implementation

from numpy import zeros

def globalAlignment(x, y, s):

""" Calculate global alignment value of sequences x and y using
dynamic programming. Return global alignment value. """
D = zeros((len(x)+1, len(y)+1), dtype=int)
for j in range(1, len(y)+1): Use of new
D[0, j] = D[0, j-1] + s('-', y[j-1])
for i in range(1, len(x)+1): penalty function
D[i, 0] = D[i-1, 0] + s(x[i-1], '-')
for i in range(1, len(x)+1):
for j in range(1, len(y)+1):
D[i, j] = min(D[i-1, j-1] + s(x[i-1], y[j-1]), # diagonal
D[i-1, j ] + s(x[i-1], '-'), # vertical
D[i , j-1] + s('-', y[j-1])) # horizontal
return D, D[len(x), len(y)]

Similar to edit distance

http://bit.ly/CG_DP_Global
Global alignment: implementation
def exampleCost(xc, yc):
""" Cost function assigning 0 to match, 2 to transition, 4 to
transversion, and 8 to a gap """
if xc == yc: return 0 # match
if xc == '-' or yc == '-': return 8 # gap
minc, maxc = min(xc, yc), max(xc, yc)
if minc == 'A' and maxc == 'G': return 2 # transition
elif minc == 'C' and maxc == 'T': return 2 # transition
return 4 # transversion

A C G T -
A 0 4 2 4 8
C 4 0 4 2 8
G 2 4 0 4 8
T 4 2 4 0 8
- 8 8 8 8

http://bit.ly/CG_DP_Global
Global alignment: dynamic programming
D = zeros((len(x)+1, len(y)+1), dtype=int)
globalAlignment for j in range(1, len(y)+1):
D[0, j] = D[0, j-1] + s('-', y[j-1])
initialization: for i in range(1, len(x)+1):
D[i, 0] = D[i-1, 0] + s(x[i-1], '-')

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80 s(a, b)
T 8 A C G T -
A 0 4 2 4 8
A 16 C 4 0 4 2 8
C 24 G 2 4 0 4 8
T 4 2 4 0 8
G 32 - 8 8 8 8
T 40
C 48
A 56
G 64
C 72
Global alignment: dynamic programming
for i in range(1, len(x)+1):
globalAlignment for j in range(1, len(y)+1):
D[i, j] = min(D[i-1, j-1] + s(x[i-1], y[j-1]), # diagonal
loop: D[i-1, j ] + s(x[i-1], '-'), # vertical
D[i , j-1] + s('-', y[j-1])) # horizontal

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80 s(a, b)
T 8 0 8 16 24 32 40 48 56 64 72 A C G T -
A 0 4 2 4 8
A 16 8 0 8 16 24 32 40 48 56 64 C 4 0 4 2 8
C 24 16 8 ? G 2 4 0 4 8
T 4 2 4 0 8
G 32 - 8 8 8 8
T 40
C 48
A 56
G 64
C 72
Global alignment: dynamic programming
for i in range(1, len(x)+1):
globalAlignment for j in range(1, len(y)+1):
D[i, j] = min(D[i-1, j-1] + s(x[i-1], y[j-1]), # diagonal
loop: D[i-1, j ] + s(x[i-1], '-'), # vertical
D[i , j-1] + s('-', y[j-1])) # horizontal

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80 s(a, b)
T 8 0 8 16 24 32 40 48 56 64 72 A C G T -
A 0 4 2 4 8
A 16 8 0 8 16 24 32 40 48 56 64 C 4 0 4 2 8
C 24 16 8 2 10 18 24 32 40 48 56 G 2 4 0 4 8
T 4 2 4 0 8
G 32 24 16 10 2 10 18 26 34 40 48 - 8 8 8 8
T 40 32 24 16 10 2 10 18 26 34 42
C 48 40 32 24 18 10 2 10 18 26 34
A 56 48 40 32 26 18 10 2 10 18 26
G 64 56 48 40 32 26 18 10 6 10 18
C 72 64 56 48 40 34 26 18 12 10 10 Optimal global
alignment value
Global alignment: getting the alignment

Traceback works just as it did for edit distance

ϵ T A T G T C A T G C
ϵ 0 8 16 24 32 40 48 56 64 72 80
T 8 0 8 16 24 32 40 48 56 64 72
A 16 8 0 8 16 24 32 40 48 56 64
C 24 16 8 2 10 18 24 32 40 48 56
G 32 24 16 10 2 10 18 26 34 40 48
T 40 32 24 16 10 2 10 18 26 34 42 TACGTCA-GC
C 48 40 32 24 18 10 2 10 18 26 34 || |||| ||
TATGTCATGC
A 56 48 40 32 26 18 10 2 10 18 26 +2 +8
(transition) (gap)
G 64 56 48 40 32 26 18 10 6 10 18
C 72 64 56 48 40 34 26 18 12 10 10
Global alignment: summary

Matrix-ﬁlling dynamic programming algorithm is O(mn) time and space

FillIng matrix is O(mn) space and time, yields global alignment value

Traceback is O(m + n) time, yields optimal alignment

Global alignment: scoring functions

Where do these penalty functions come from? A C G T -

A 0 4 2 4 8
C 4 0 4 2 8
G 2 4 0 4 8
T 4 2 4 0 8
- 8 8 8 8
They can be based on:

Expected frequency of the diﬀerent mutational events

How interchangeable are the alternatives are from a biological perspective

Does the substitution change the shape or function of the molecule

Prevalence of simple (linear, constant, aﬃne) gap penalties is mostly because

that’s what we can do eﬃciently, as discussed in HW4

One occasionally sees more general (e.g. convex) gap penalties

BLOSUM62 Some amino acid substitutions have a smaller impact on
structure & function than others. BLOSUM62 elements
are, roughly speaking, log-odds of observing these
substitutions between two highly related proteins

Rare; larger eﬀect on Common; modest eﬀect

structure/function on structure/function

negative positive

Matrix is symmetric

Amino acids

String Edit PDF
No ratings yet
String Edit PDF
39 pages
E Mahesh PGT Mathematics
No ratings yet
E Mahesh PGT Mathematics
14 pages
Edit Distance & Dynamic Programming
No ratings yet
Edit Distance & Dynamic Programming
30 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Sequence Alignment
No ratings yet
Sequence Alignment
92 pages
Pairwise Alignment 2017
No ratings yet
Pairwise Alignment 2017
49 pages
Sequence Alignment Techniques
No ratings yet
Sequence Alignment Techniques
49 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
Lecture5 Newest
No ratings yet
Lecture5 Newest
124 pages
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
No ratings yet
Sequence Comparison: Motivation: Finding Similarity Between Sequences Is Important For Many Biological Questions
47 pages
Student'S Solutions Manual: Introduction To Linear Programming
No ratings yet
Student'S Solutions Manual: Introduction To Linear Programming
76 pages
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
No ratings yet
Pattern Matching Techniques and Their Applications To Computational Molecular Biology - A Review
8 pages
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
No ratings yet
Bioinformatics 1: Lecture 3: - Pairwise Alignment - Substitution - Dynamic Programming Algorithm
32 pages
Needlemanwunsch 130216130832 Phpapp01
No ratings yet
Needlemanwunsch 130216130832 Phpapp01
39 pages
Dynamic Programming
No ratings yet
Dynamic Programming
28 pages
3.1 Sequence Alignment
No ratings yet
3.1 Sequence Alignment
8 pages
String Alignment Techniques
No ratings yet
String Alignment Techniques
76 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Running BLAST Through Perl
No ratings yet
Running BLAST Through Perl
35 pages
Multiobjective Optimization Exercises
No ratings yet
Multiobjective Optimization Exercises
6 pages
Needleman Algo
No ratings yet
Needleman Algo
4 pages
Alignment Algorithm
No ratings yet
Alignment Algorithm
58 pages
Counterpropagation Networks
No ratings yet
Counterpropagation Networks
6 pages
18-IntroNLP II PDF
No ratings yet
18-IntroNLP II PDF
187 pages
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
No ratings yet
Pairwise Sequence Alignment: CS 838 WWW - Cs.wisc - Edu/ Craven/cs838.html Mark Craven Craven@biostat - Wisc.edu January 2001
18 pages
Lab5 Ch2 Sequence Similarity PDF
No ratings yet
Lab5 Ch2 Sequence Similarity PDF
95 pages
Mobile Radio Channel Mitigation Techniques
No ratings yet
Mobile Radio Channel Mitigation Techniques
34 pages
Notes On Dynamic-Programming Sequence Alignment
No ratings yet
Notes On Dynamic-Programming Sequence Alignment
8 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
38 pages
423f11 Lec4 Gaps
No ratings yet
423f11 Lec4 Gaps
17 pages
Scoring Matrices 06
No ratings yet
Scoring Matrices 06
25 pages
Sequence Alignment: Lecture 2, Thursday April 3, 2003
No ratings yet
Sequence Alignment: Lecture 2, Thursday April 3, 2003
39 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Aiml Docx
No ratings yet
Aiml Docx
11 pages
cng465 hw1
No ratings yet
cng465 hw1
2 pages
Week 4
No ratings yet
Week 4
38 pages
Definition of Minimum Edit Distance
No ratings yet
Definition of Minimum Edit Distance
49 pages
What Is Dynamic Programming?
No ratings yet
What Is Dynamic Programming?
7 pages
Sequence Comparison Part 3
No ratings yet
Sequence Comparison Part 3
22 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
MILP Models For Energy-Aware Flexible Job Shop Scheduling Problem
No ratings yet
MILP Models For Energy-Aware Flexible Job Shop Scheduling Problem
22 pages
Lecture-7-Dynamic Programming Global-Sequence Alignment
No ratings yet
Lecture-7-Dynamic Programming Global-Sequence Alignment
31 pages
Sol Assignment 4 - Edit Distance & Sequence Alignment
No ratings yet
Sol Assignment 4 - Edit Distance & Sequence Alignment
4 pages
Principles of Communication Systems: Dr. Sobia Baig Electrical Engineering Dept. CIIT, Lahore
No ratings yet
Principles of Communication Systems: Dr. Sobia Baig Electrical Engineering Dept. CIIT, Lahore
19 pages
8212 4 Ece R13 Iv-Ii
No ratings yet
8212 4 Ece R13 Iv-Ii
11 pages
06DynamicProgrammingII 2x2
No ratings yet
06DynamicProgrammingII 2x2
17 pages
Zhang 2000
No ratings yet
Zhang 2000
12 pages
P4 Villoria Quiñones Pablo Alberto
No ratings yet
P4 Villoria Quiñones Pablo Alberto
2 pages
Design and Analysis of Algorithm Lab Manual - Answers
No ratings yet
Design and Analysis of Algorithm Lab Manual - Answers
13 pages
Lec3-The Kernel Trick
No ratings yet
Lec3-The Kernel Trick
4 pages
q1 Answer
No ratings yet
q1 Answer
2 pages
03 Med
No ratings yet
03 Med
52 pages
Cs6601 Project 1 Proposal
No ratings yet
Cs6601 Project 1 Proposal
1 page
Lecture1 2
No ratings yet
Lecture1 2
44 pages
Affine Gap
No ratings yet
Affine Gap
18 pages
Lecture 5
No ratings yet
Lecture 5
42 pages
Csci3104 S2018 L7
No ratings yet
Csci3104 S2018 L7
11 pages
Advanced Data Structures
No ratings yet
Advanced Data Structures
2 pages
Lecture 2
No ratings yet
Lecture 2
71 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Runge-Kutta Methods For Fuzzy Differential Equations
No ratings yet
Runge-Kutta Methods For Fuzzy Differential Equations
9 pages
Tutorial Gibbons
No ratings yet
Tutorial Gibbons
67 pages
Machine Learning For Industry 40 A Systematic Review Using Deep LearningBased Topic ModellingSensors
No ratings yet
Machine Learning For Industry 40 A Systematic Review Using Deep LearningBased Topic ModellingSensors
26 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
64 pages
Hopfield Networks
No ratings yet
Hopfield Networks
9 pages
Bioinformatics Sequence Alignment
No ratings yet
Bioinformatics Sequence Alignment
3 pages
Essentials of Business Analytics 1st Edition Camm Solutions Manual Download
100% (21)
Essentials of Business Analytics 1st Edition Camm Solutions Manual Download
35 pages
Data Structures Course Guide
No ratings yet
Data Structures Course Guide
49 pages
Bit Stuffing
No ratings yet
Bit Stuffing
17 pages
05 Dynamic Programming I I
No ratings yet
05 Dynamic Programming I I
64 pages
001 - AffineGap (2023 - 08 - 02 04 - 29 - 18 UTC)
No ratings yet
001 - AffineGap (2023 - 08 - 02 04 - 29 - 18 UTC)
18 pages
MIT6 047F15 Lecture03
No ratings yet
MIT6 047F15 Lecture03
56 pages
Course Details
No ratings yet
Course Details
1 page
LPI Radar Waveform Recognition Based On Features F 2
No ratings yet
LPI Radar Waveform Recognition Based On Features F 2
23 pages
Homework Week 2 Big Oh
No ratings yet
Homework Week 2 Big Oh
3 pages
Image Processing 2016 2019
No ratings yet
Image Processing 2016 2019
3 pages
G8 Second Term B
No ratings yet
G8 Second Term B
5 pages
Unit Iv
No ratings yet
Unit Iv
98 pages
Part 2 Final Examination
No ratings yet
Part 2 Final Examination
5 pages
Dynamic Programming - 2
No ratings yet
Dynamic Programming - 2
24 pages
KPM Algorithm
No ratings yet
KPM Algorithm
2 pages
Time and Space Complexity
No ratings yet
Time and Space Complexity
12 pages
LP 3 - Solving MILPs With PuLP
No ratings yet
LP 3 - Solving MILPs With PuLP
7 pages
Lecture 9-10 (Sequence Alignment)
No ratings yet
Lecture 9-10 (Sequence Alignment)
48 pages
11 (1) Merged
No ratings yet
11 (1) Merged
12 pages
Experiment No. 1 and 2
No ratings yet
Experiment No. 1 and 2
26 pages
0420 Approx Editdist
No ratings yet
0420 Approx Editdist
15 pages

Global Alignment: Ben Langmead

Uploaded by

Global Alignment: Ben Langmead

Uploaded by

Global Alignment

Department of Computer Science

Please sign guestbook (www.langmead-lab.org/teaching-materials)

What if it doesn’t make sense for every edit to cost 1?

If you compare two human genomes, you see some kinds of

For random mutations,

...but if you compare two humans, transition

from numpy import zeros

def globalAlignment(x, y, s):

Similar to edit distance

Traceback works just as it did for edit distance

Matrix-ﬁlling dynamic programming algorithm is O(mn) time and space

Traceback is O(m + n) time, yields optimal alignment

Where do these penalty functions come from? A C G T -

Expected frequency of the diﬀerent mutational events

How interchangeable are the alternatives are from a biological perspective

Does the substitution change the shape or function of the molecule

Prevalence of simple (linear, constant, aﬃne) gap penalties is mostly because

One occasionally sees more general (e.g. convex) gap penalties

Rare; larger eﬀect on Common; modest eﬀect

You might also like