0% found this document useful (0 votes)

64 views59 pages

Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences

Bioinformatics emerged from the marriage of computer science and molecular biology to analyze massive amounts of biological data, like that produced by the Human Genome Project. It uses algorithms and techniques from computer science to solve problems in molecular biology, like comparing genomic sequences to understand evolution. As genomic data exploded publicly, bioinformatics was needed to efficiently store, analyze, and make sense of this wealth of biological information hidden within sequences, structures, literature, and data.

Uploaded by

Ima kuro-bike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views59 pages

Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences

Uploaded by

Ima kuro-bike

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 59

Need & Emergence of the Field

Speaker
Shashi Shekhar
Head of computational Section
Biowits Life Sciences
 The marriage between computer science and
molecular biology
◦ The algorithm and techniques of computer science
are being used to solve the problems faced by
molecular biologists

 ‘Information technology applied to the

management and analysis of biological data’
◦ Storage and Analysis are two of the important
functions – bioinformaticians build tools for each.
Biology Chemistry

Computer
Science Statistics

Bioinformatics
 The need for bioinformatics has arisen from the recent
explosion of publicly available genomic information,
such as resulting from the Human Genome Project.
 Gain a better understanding of gene analysis,
taxonomy, & evolution.
 To work efficiently on the rational drug designs and
reduce the time taken for the development of drug
manually.
 To uncover the wealth of Biological information hidden
in the mass of sequence, structure, literature and
biological data.
 It is being used now and in the foreseeable future in the
areas of molecular medicine.
 It has environmental benefits in identifying waste and
clean up bacteria.
 In agriculture, it can be used to produce high yield, low
maintenance crops.
 Molecular Medicine
 Gene Therapy
 Drug Development
 Microbial genome applications
 Crop Improvement
 Forensic Analysis of Microbes
 Biotechnology
 Evolutionary Studies
 Bio-Weapon Creation
 In Experimental Molecular Biology
 In Genetics and Genomics
 In generating Biological Data
 Analysis of gene and protein expression
 Comparison of genomic data
 Understanding of evolutionary aspect of Evolution
 Understanding biological pathways and networks in
System Biology
 In Simulation & Modeling of DNA, RNA & Protein
e.g. homology
searches

Bioinformatics lecture
March 5, 2002
organisation of knowledge
(sequences, structures,
functional data)
 Prediction of structure from sequence
◦ secondary structure
◦ homology modelling, threading
◦ ab initio 3D prediction
 Analysis of 3D structure
◦ structure comparison/ alignment
◦ prediction of function from structure
◦ molecular mechanics/ molecular dynamics
◦ prediction of molecular interactions, docking
 Structure databases (RCSB)
 Sequence Similarity
 Tools used for sequence similarity searching
 There uses in biology or to us
 Databases
 Different types of databases
 One could align the sequence so that many
corresponding residues match.
 Strong similarity between two sequences is a strong
argument for their homology.
 Homology: Two(or more) sequences have a common
ancestor.
 Similarity: Two(or more) sequences are similar by some
criterion, and it does not refer to any historical process.
 To find the relatedness of the proteins or gene, if they
have a common ancestor or not.
 Mutation in the sequences, brings the changes or
divergence in the sequences.
 Can also reveal the part of the sequence which is crucial
for the functioning of gene or protein.
 Optimal Alignment: The alignment that is the best,
given a defined set of rules and parameter values for
comparing different alignments.
 Global Alignment: An alignment that assumes that the
two proteins are basically similar over the entire length
of one another. The alignment attempts to match them
to each other from end to end.
 Local Alignment: An alignment that searches for
segments of the two sequences that match well. There
is no attempt to force entire sequences into an
alignment, just those parts that appear to have good
similarity.

(contd.)
 Gaps & Insertions: In an alignment, one may achieve much
better correspondence between two sequences if one allows a
gap to be introduced in one sequence. Equivalently, one
could allow an insertion in the other sequence. Biologically
this corresponds to an mutation event.
 Substitution matrix: A Substitution matrix describes the two
residue types would mutate to each other in evolutionary
time. This is used to estimate how well two residues of given
types would match if they were aligned in a sequence
alignment.
 Gap Penalty: The gap penalty is used to help decide whether
or not to accept a gap or insertion in an alignment when it is
possible to achieve a good alignment residue to residue at
some other neighboring point in the sequence.
 Similarity indicates conserved function
 Human and mouse genes are more than 80% similar at
sequence level
 But these genes are small fraction of genome
 Most sequences in the genome are not recognizably similar
 Comparing sequences helps us understand function
◦ Locate similar gene in another species to understand your new
gene
 Match score: +1
 Mismatch score: +0
 Gap penalty: –1
ACGTCTGATACGCCGTATAGTCTATCT
||||| ||| || ||||||||
----CTGATTCGC---ATCGTCTATCT
 Matches: 18 × (+1)
 Mismatches: 2 × 0
 Gaps: 7 × (– 1)
Score = +11
 We want to find alignments that are evolutionarily likely.
 Which of the following alignments seems more likely to
you?
ACGTCTGATACGCCGTATAGTCTATCT
ACGTCTGAT-------ATAGTCTATCT 
ACGTCTGATACGCCGTATAGTCTATCT
AC-T-TGA--CG-CGT-TA-TCTATCT 
 We can achieve this by penalizing more for a new gap,
than for extending an existing gap
 Match/mismatch score: +1/+0
 Origination/length penalty: –2/–1
ACGTCTGATACGCCGTATAGTCTATCT
||||| ||| || ||||||||
----CTGATTCGC---ATCGTCTATCT
 Matches: 18 × (+1)
 Mismatches: 2 × 0
 Origination: 2 × (–2)
 Length: 7 × (–1)
Score = +7
 Alignment scoring and substitution matrices
 Aligning two sequences
◦ Dotplots
◦ The dynamic programming algorithm
◦ Significance of the results
 Heuristic methods
◦ FASTA
◦ BLAST
◦ Interpreting the output
 Examples:
 Staden: simple text file, lines <= 80 characters
 FASTA: simple text file, lines <= 80 characters, one line
header marked by ">"
 GCG: structured format with header and formatted
sequence

 Sequence format descriptions e.g. on

http://www.infobiogen.fr/doc/tutoriel/formats.html
 Local sequence comparison:

 assumption of evolution by point mutations

◦ amino acid replacement (by base replacement)
◦ amino acid insertion
◦ amino acid deletion

 scores:
◦ positive for identical or similar
◦ negative for different
◦ negative for insertion in one of the two sequences
 Simple comparison without alignment

 Similarities between sequences show up in 2D diagram

identity (i=j)

similarity of sequence
with other parts of itself
 The 1st alignment: highly significant
 The 2nd: plausible
 The 3rd: spurious

 Distinguish by alignment score

 Similarities increase score
substitution matrix
 Mismatches decrease score
 Gaps decrease score gap penalties
 Substitution matrix weights replacement of one residue
by another:
◦ Similar -> high score (positive)
◦ Different -> low score (negative)
 Simplest is identity matrix (e.g. for nucleic acids)
A C G T
A 1 0 0 0
C 0 1 0 0
G 0 0 1 0
T 0 0 0 1
 PAM matrix series (PAM1 ... PAM250):
◦ Derived from alignment of very similar sequences
◦ PAM1 = mutation events that change 1% of AA
◦ PAM2, PAM3, ... extrapolated by matrix multiplication
e.g.: PAM2 = PAM1*PAM1; PAM3 = PAM2 * PAM1 etc

 Problems with PAM matrices:

◦ Incorrect modelling of long time substitutions, since
conservative mutations dominated by single nucleotide
change
◦ e.g.: L <–> I, L <–> V, Y <–> F
long time: any Amino Acid change
positive and negative values
identity score depends on residue
 BLOSUM series (BLOSUM50, BLOSUM62, ...)
 derived from alignments of distantly related sequence
 BLOCKS database:
◦ ungapped multiple alignments of protein families
at a given identity

 BLOSUM50 better for gapped alignments

 BLOSUM62 better for ungapped alignments
Blosum62 substitution matrix
 Significance of alignment:
 Depends critically on gap penalty

 Need to adjust to given sequence

 Gap penalties influenced by knowledge of structure

etc.

 Simple rules when nothing is known (linear or affine)

 Dynamic programming = build up optimal alignment
using previous solutions for optimal alignments of
subsequences.
 The dynamic programming relies on a principle of
optimality. This principle states that in an optimal
sequence of decisions or choices, each subsequence
must also be optimal.
 The principle can be related as follows: the optimal
solution to a problem is a combination of optimal
solutions to some of its sub-problems.
 Construct a two-dimensional matrix whose axes are the
two sequences to be compared.
 The scores are calculated one row at a time. This starts
with the first row of one sequence, which is used to
scan through the entire length of the other sequence,
followed by scanning of the second row.
 The scanning of the second row takes into account the
scores already obtained in the first round. The best
score is put into the bottom right corner of an
intermediate matrix.
 This process is iterated until values for all the cells are
filled.
Contd.
Contd.
 The results are traced back through the matrix in
reverse order from the lower right-hand corner of the
matrix toward the origin of the matrix in the upper left-
hand corner.
 The best matching path is the one that has the
maximum total score.
 If two or more paths reach the same highest score, one
is chosen arbitrarily to represent the best alignment.
 The path can also move horizontally or vertically at a
certain point, which corresponds to introduction of a
gap or an insertion or deletion for one of the two
sequences.
 Global alignment (ends aligned)
◦ Needleman & Wunsch, 1970

 Local alignment (subsequences aligned)

◦ Smith & Waterman, 1981

 Searching for repetitions

 Searching for overlap

 Multi-step approach to find high-scoring alignments

 Exact short word matches

 Maximal scoring ungapped extensions

 Identify gapped alignments

Contd.
 FASTA also uses E-values and bit scores. The FASTA output
provides one more statistical parameter, the Z-score.
 This describes the number of standard deviations from the
mean score for the database search.
 Most of the alignments with the query sequence are with
unrelated sequences, the higher the Z-score for a reported
match, the further away from the mean of the score
distribution, hence, the more significant the match.
 For a Z-score > 15, the match can be considered extremely
significant, with certainty of a homologous relationship.
 If Z is in the range of 5 to 15, the sequence pair can be
described as highly probable homologs.
 If Z < 5, their relationships is described as less certain.
 Multi-step approach to find high-scoring alignments

 List words of fixed length (3AA) expected to give score

larger than threshold

 For every word, search database and extend ungapped

alignment in both directions

 New versions of BLAST allow gaps

Contd.
 The E-value provides information about the likelihood that a
given sequence match is purely by chance. The lower the E-
value, the less likely the database match is a result of random
chance and therefore the more significant the match is.
 If E < 1e − 50 (or 1 × 10−50), there should be an extremely
high confidence that the database match is a result of
homologous relationships.
 If E is between 0.01 and 1e − 50, the match can be considered
a result of homology.
 If E is between 0.01 and 10, the match is considered not
significant, but may hint at a tentative remote homology
relationship. Additional evidence is needed.
 If E > 10, the sequences under consideration are either
unrelated or related by extremely distant relationships that fall
below the limit of detection with the current method.
 Various versions:

 Blastn: nucleotide sequences

 Blastp: protein sequences
 tBlastn: protein query - translated database
 Blastx: nucleotide query - protein database
 tBlastx: nucleotide query - translated database
 Very fast growth of biological data
 Diversity of biological data:
◦ Primary sequences
◦ 3D structures
◦ Functional data
 Database entry usually required for publication
◦ Sequences
◦ Structures
 Database entry may replace primary publication
◦ Genomic approaches
Nucleic Acid Protein
EMBL (Europe) PIR -
Protein Information
Resource
GenBank (USA) MIPS
DDBJ (Japan) SWISS-PROT
University of Geneva,
now with EBI
TrEMBL
A supplement to SWISS-
PROT
NRL-3D
 Three databanks exchange data on a daily basis
 Data can be submitted and accessed at either location

 GenBank
◦ www.ncbi.nlm.nih.gov/Genbank/GenbankOverview.html
 EMBL
◦ www.ebi.ac.uk/embl/index.html
 DNA Databank of Japan (DDBJ)
◦ www.nig.ac.jp/home.html
 As there are many databases which one to search? Some
are good in some aspects and weak in others?
 Composite databases is the answer – which has several
databases for its base data
 Search on these databases is indexed and streamlined
so that the same stored sequence is not searched twice
in different databases.
 OWL has these as their primary databases.
◦ SWISS PROT (top priority)
◦ PIR
◦ GenBank
◦ NRL-3D
 Store secondary structure info or results
of searches of the primary databases.

Composite Primary Source

Databases
PROSITE SWISS-PROT

PRINTS OWL
 We have sequenced and identified genes. So we
know what they do.
 The sequences are stored in databases.
 So if we find a new gene in the human genome we
compare it with the already found genes which are
stored in the databases.
 Since there are large number of databases we cannot
do sequence alignment for each and every sequence
 So heuristics must be used again.
 Applications:-
Bioinformatics joins mathematics, statistics, and computer
science and information technology to solve complex
biological problems.

 Sequence Analysis:-
The application of sequence analysis determines those genes
which encode regulatory sequences or peptides by using the
information of sequencing. These computers and tools also
see the DNA mutations in an organism and also detect and
identify those sequences which are related. Special software
is used to see the overlapping of fragments and their
assembly.

Contd.
 Prediction of Protein Structure:-
It is easy to determine the primary structure of proteins
in the form of amino acids which are present on the
DNA molecule but it is difficult to determine the
secondary, tertiary or quaternary structures of proteins.
Tools of bioinformatics can be used to determine the
complex protein structures.
 Genome Annotation:-
In genome annotation, genomes are marked to know
the regulatory sequences and protein coding. It is a very
important part of the human genome project as it
determines the regulatory sequences.
 Comparative Genomics:-
Comparative genomics is the branch of bioinformatics
which determines the genomic structure and function
relation between different biological species. For this
purpose, intergenomic maps are constructed which
enable the scientists to trace the processes of evolution
that occur in genomes of different species.

 Health and Drug discovery:-

The tools of bioinformatics are also helpful in drug
discovery, diagnosis and disease management.
Complete sequencing of human genes has enabled the
scientists to make medicines and drugs which can
target more than 500 genes.

Biology: Textbook For Class Xii
No ratings yet
Biology: Textbook For Class Xii
10 pages
Bioinformatics Sequence Alignment
No ratings yet
Bioinformatics Sequence Alignment
32 pages
Bio in For Ma Tics
No ratings yet
Bio in For Ma Tics
54 pages
Bio Medical Tics - Sequence Analysis - Alignment - 2011
No ratings yet
Bio Medical Tics - Sequence Analysis - Alignment - 2011
96 pages
Introduction To Bioinformatics Presentation
No ratings yet
Introduction To Bioinformatics Presentation
13 pages
Alignment of Sequences
No ratings yet
Alignment of Sequences
33 pages
Sequence Alignment Methods and Algorithms
75% (4)
Sequence Alignment Methods and Algorithms
37 pages
Bioinformatics Sequence Alignments
No ratings yet
Bioinformatics Sequence Alignments
37 pages
Bioinformatics:: Guide To Bio-Computing and The Internet
No ratings yet
Bioinformatics:: Guide To Bio-Computing and The Internet
34 pages
Unit 2.1
No ratings yet
Unit 2.1
77 pages
Lecture 3 and 4 LSM2241
No ratings yet
Lecture 3 and 4 LSM2241
6 pages
Bioinformatics Pairwise Alignment
No ratings yet
Bioinformatics Pairwise Alignment
128 pages
Sequence Alignment: "Continuing.." (5th Week)
No ratings yet
Sequence Alignment: "Continuing.." (5th Week)
61 pages
Genomic Sequence Alignment
No ratings yet
Genomic Sequence Alignment
25 pages
Sequence Comparison
No ratings yet
Sequence Comparison
39 pages
PCB Lect02 Pairwise Allign
No ratings yet
PCB Lect02 Pairwise Allign
51 pages
Sequence Alignment & BLAST Guide
No ratings yet
Sequence Alignment & BLAST Guide
37 pages
Sequence Analysis - Pairwise Alignment
No ratings yet
Sequence Analysis - Pairwise Alignment
26 pages
Sequence Alignment (Chapter 6) : The Biological Problem
No ratings yet
Sequence Alignment (Chapter 6) : The Biological Problem
44 pages
Sequence Alignment in Bioinformatics
No ratings yet
Sequence Alignment in Bioinformatics
61 pages
Class Pairwise Alignment
No ratings yet
Class Pairwise Alignment
17 pages
Sequence Alignment Basics
No ratings yet
Sequence Alignment Basics
27 pages
Algorithm Design and Scoring Matrices PDF
No ratings yet
Algorithm Design and Scoring Matrices PDF
31 pages
Introduction To Different Resources of Bioinformatics and Application PDF
No ratings yet
Introduction To Different Resources of Bioinformatics and Application PDF
55 pages
Introduction To Bioinformatics: Tolga Can
No ratings yet
Introduction To Bioinformatics: Tolga Can
21 pages
Introduction To Bioinformatics Lecture 3
No ratings yet
Introduction To Bioinformatics Lecture 3
20 pages
Butterfly Species Diversity Analysis
No ratings yet
Butterfly Species Diversity Analysis
12 pages
Bacteria and Viruses Study Guide
No ratings yet
Bacteria and Viruses Study Guide
2 pages
Sequence Analysis in Bioinformatics
No ratings yet
Sequence Analysis in Bioinformatics
18 pages
Bioinformatics Chaper3
No ratings yet
Bioinformatics Chaper3
34 pages
General Bacteriology 23.04.2020
No ratings yet
General Bacteriology 23.04.2020
99 pages
Bioinformatics Seminar3rdOct18
No ratings yet
Bioinformatics Seminar3rdOct18
25 pages
Sequence Alignment Presentation
No ratings yet
Sequence Alignment Presentation
27 pages
Spemann and Mangold's Discovery of The Organizer: Background
No ratings yet
Spemann and Mangold's Discovery of The Organizer: Background
7 pages
Microbiology Lectureship Applicants
No ratings yet
Microbiology Lectureship Applicants
33 pages
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
No ratings yet
Dr. Zoya Khalid Zoya - Khalid@nu - Edu.pk
51 pages
Data Mining-Mining Sequence Patterns in Biological Data
No ratings yet
Data Mining-Mining Sequence Patterns in Biological Data
6 pages
History of Microscopy & Cell Theory
No ratings yet
History of Microscopy & Cell Theory
12 pages
Calcium's Role in Acidic Soils
No ratings yet
Calcium's Role in Acidic Soils
15 pages
Designer Babies For Reading Packet 1
No ratings yet
Designer Babies For Reading Packet 1
4 pages
Sequence Alignment
No ratings yet
Sequence Alignment
24 pages
Unit 3 Sequence Alignment and Phylogenetic Tree
No ratings yet
Unit 3 Sequence Alignment and Phylogenetic Tree
70 pages
W03 Pairwise
No ratings yet
W03 Pairwise
55 pages
VCE Biology Study Summary 2013-2016
No ratings yet
VCE Biology Study Summary 2013-2016
4 pages
5.pairwise Alignment
No ratings yet
5.pairwise Alignment
85 pages
MIT Environment 10-21-04
No ratings yet
MIT Environment 10-21-04
2 pages
Infection Control for Nurses
No ratings yet
Infection Control for Nurses
8 pages
Lecture 6 - Sequence Analysis
No ratings yet
Lecture 6 - Sequence Analysis
28 pages
Lec 02
No ratings yet
Lec 02
103 pages
Cambridge International AS & A Level: BIOLOGY 9700/42
No ratings yet
Cambridge International AS & A Level: BIOLOGY 9700/42
24 pages
Unit Iii
No ratings yet
Unit Iii
14 pages
LO5 Pairwise Sequence Alignment
No ratings yet
LO5 Pairwise Sequence Alignment
11 pages
Potensi Antibakteri Bawang Putih
No ratings yet
Potensi Antibakteri Bawang Putih
9 pages
Chap 03 BioInfo
No ratings yet
Chap 03 BioInfo
15 pages
Sequence Alignment for Bioinformatics
No ratings yet
Sequence Alignment for Bioinformatics
51 pages
Role of The Govt in Health
No ratings yet
Role of The Govt in Health
4 pages
CSAT Exam Syllabus & Analysis
No ratings yet
CSAT Exam Syllabus & Analysis
3 pages
Unit4complete 240415170843 f5dc540f
No ratings yet
Unit4complete 240415170843 f5dc540f
44 pages
Bioinfo Ders 7 ALLIGNMENT - 1
No ratings yet
Bioinfo Ders 7 ALLIGNMENT - 1
55 pages
Argumentative Essay Hela-Real
No ratings yet
Argumentative Essay Hela-Real
4 pages
L6-Pairwise Seq Alignment
No ratings yet
L6-Pairwise Seq Alignment
70 pages
Immunization Grade 1 and 7
No ratings yet
Immunization Grade 1 and 7
7 pages
MCU Reg Form 2019-2020 2nd Sem
100% (1)
MCU Reg Form 2019-2020 2nd Sem
2 pages
1.2 Nucleic Acids, Biology For Engineers DRP
No ratings yet
1.2 Nucleic Acids, Biology For Engineers DRP
4 pages
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
No ratings yet
Computational Biology (3) Alignment Algorithms: by Dr. Safynaz Abdel-Fattah Computer Science Department
107 pages
Sequence Alignment
No ratings yet
Sequence Alignment
9 pages
Ceratocisto
No ratings yet
Ceratocisto
10 pages
Biology Year Plan 2024 2025
No ratings yet
Biology Year Plan 2024 2025
2 pages
Nucleic Acids Checklist
No ratings yet
Nucleic Acids Checklist
4 pages
Sequence Alignment
No ratings yet
Sequence Alignment
25 pages
Viruses 1
No ratings yet
Viruses 1
25 pages
Importance and Significance of Sequence Alignment - pptx12
No ratings yet
Importance and Significance of Sequence Alignment - pptx12
15 pages
Protein Sequence Alignment Lecture Notes
No ratings yet
Protein Sequence Alignment Lecture Notes
2 pages
MolecularBiology PPT BG-Final
No ratings yet
MolecularBiology PPT BG-Final
12 pages
BMB 822 - Bioinformatics and Computing - Lecture Notes
No ratings yet
BMB 822 - Bioinformatics and Computing - Lecture Notes
94 pages
Msa MTech
No ratings yet
Msa MTech
17 pages
QP Biology Xii Pre Board 1 2024
No ratings yet
QP Biology Xii Pre Board 1 2024
11 pages
Questions - Worksheet - Homework - 9th - Science - 2024-04-06T01 - 13
No ratings yet
Questions - Worksheet - Homework - 9th - Science - 2024-04-06T01 - 13
8 pages
Cassava Virus Impact in Kenya
No ratings yet
Cassava Virus Impact in Kenya
8 pages
5 Sequence Alignment
No ratings yet
5 Sequence Alignment
21 pages
Unit3 Final
No ratings yet
Unit3 Final
114 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
35 pages
Sequence Alignment
No ratings yet
Sequence Alignment
8 pages
Bioinformatics 2
No ratings yet
Bioinformatics 2
26 pages
BSC HONS. Biological Science Syllabus 25-26
No ratings yet
BSC HONS. Biological Science Syllabus 25-26
30 pages
Sequence Alignment
No ratings yet
Sequence Alignment
63 pages
Bioinformatics MSC
No ratings yet
Bioinformatics MSC
85 pages
Introduction To Bioinformatics
No ratings yet
Introduction To Bioinformatics
55 pages

Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences

Uploaded by

Need & Emergence of The Field: Speaker Shashi Shekhar Head of Computational Section Biowits Life Sciences

Uploaded by

Need & Emergence of the Field

 ‘Information technology applied to the

 Sequence format descriptions e.g. on

 assumption of evolution by point mutations

 Similarities between sequences show up in 2D diagram

 Distinguish by alignment score

 Problems with PAM matrices:

 BLOSUM50 better for gapped alignments

 Need to adjust to given sequence

 Gap penalties influenced by knowledge of structure

 Simple rules when nothing is known (linear or affine)

 Local alignment (subsequences aligned)

 Searching for repetitions

 Searching for overlap

 Exact short word matches

 Maximal scoring ungapped extensions

 Identify gapped alignments

 List words of fixed length (3AA) expected to give score

 For every word, search database and extend ungapped

 New versions of BLAST allow gaps

 Blastn: nucleotide sequences

Composite Primary Source

 Health and Drug discovery:-

You might also like