Basic Bioinformatics (BE304)
Lab Assignment 9
Total Marks: 20
1. Find nucleotide sequences using a protein query by tblastn
>P01013 PROTEIN X
QIKDLLVSSSTDLDTTLVLVNAIYFKGMWKTAFNAEDTREMPFHVTKQESKP
VQMMCMNNSFNVATLPAEKMKILELPFASGDLSMLVLLPDEVSDLERIEKTI
NFEKLTEWTNPNTMEKRRVKVYLPQMKIEEKYNLTSVLMALGMTDLFIPSA
NLTGISSAESLKISQAVHGAFMELSEDGIEMAGSTGVIEDIKHSPESEQFRADH
PFLFLIKHNPTNTIVYFGRYWSP
a. Open the NCBI Blast suite (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
b. Select tblastn program
c. Use the sequence given above.
d. Identify the gene which codes for the above protein and its source
GENE: ovalbumin-related protein X (SERPINB14C)
SOURCE: Gallus gallus(chicken)
e. Which organism has the second-best hit?
Meleagris gallopavo
f. Download the nucleotide FASTA sequence of the identified gene
Download it from download datasets option
(5)
2. Find protein sequence using a nucleotide query by blastx
>NC_000913.3:c2823769-2822708 Escherichia coli str. K-12 substr.
MG1655, complete genome
ATGGCTATCGACGAAAACAAACAGAAAGCGTTGGCGGCAGCACTGGGCCAGATTGAGAAACAAT
TTGGTAAAGGCTCCATCATGCGCCTGGGTGAAGACCGTTCCATGGATGTGGAAACCATCTCTAC
CGGTTCGCTTTCACTGGATATCGCGCTTGGGGCAGGTGGTCTGCCGATGGGCCGTATCGTCGAA
ATCTACGGACCGGAATCTTCCGGTAAAACCACGCTGACGCTGCAGGTGATCGCCGCAGCGCAGC
GTGAAGGTAAAACCTGTGCGTTTATCGATGCTGAACACGCGCTGGACCCAATCTACGCACGTAA
ACTGGGCGTCGATATCGACAACCTGCTGTGCTCCCAGCCGGACACCGGCGAGCAGGCACTGGAA
ATCTGTGACGCCCTGGCGCGTTCTGGCGCAGTAGACGTTATCGTCGTTGACTCCGTGGCGGCAC
TGACGCCGAAAGCGGAAATCGAAGGCGAAATCGGCGACTCTCACATGGGCCTTGCGGCACGTAT
GATGAGCCAGGCGATGCGTAAGCTGGCGGGTAACCTGAAGCAGTCCAACACGCTGCTGATCTTC
ATCAACCAGATCCGTATGAAAATTGGTGTGATGTTCGGTAACCCGGAAACCACTACCGGTGGTA
ACGCGCTGAAATTCTACGCCTCTGTTCGTCTCGACATCCGTCGTATCGGCGCGGTGAAAGAGGG
CGAAAACGTGGTGGGTAGCGAAACCCGCGTGAAAGTGGTGAAGAACAAAATCGCTGCGCCGTTT
AAACAGGCTGAATTCCAGATCCTCTACGGCGAAGGTATCAACTTCTACGGCGAACTGGTTGACC
TGGGCGTAAAAGAGAAGCTGATCGAGAAAGCAGGCGCGTGGTACAGCTACAAAGGTGAGAAGAT
CGGTCAGGGTAAAGCGAATGCGACTGCCTGGCTGAAAGATAACCCGGAAACCGCGAAAGAGATC
GAGAAGAAAGTACGTGAGTTGCTGCTGAGCAACCCGAACTCAACGCCGGATTTCTCTGTAGATG
ATAGCGAAGGCGTAGCAGAAACTAACGAAGATTTTTAA
a. Open the NCBI Blast suite (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
b. Select blastx program
c. Copy-paste the given sequence
d. Select Genetic code: Bacteria and Archaea (11)
e. Specify “Escherichia coli” as the target organism
f. Click on the ‘BLAST’ button
g. Identify the corresponding protein
Chain A, Protein RecA [Escherichia coli]
h. Find its putative role
RecA is a bacterial protein that functions in at least two ways to repair
DNA damage. First, it has a DNA strand exchange activity that catalyzes
recombinational DNA repair. Second, it has a coprotease activity that
regulates the expression of SOS genes which also are involved in DNA
repair.
i. Download the FASTA sequence of identified protein
>8TRG_A Chain A, Protein RecA [Escherichia coli]
MGSSHHHHHHHHHHHHSSGENLYFQGMAIDENKQKALAAALGQIEKQ
FGKGSIMRLGEDRSMDVETISTGSLSLDIALGA
GGLPMGRIVEIYGPESSGKTTLTLQVIAAAQREGKTCAFIDAEHALDPIY
ARKLGVDIDNLLCSQPDTGEQALEICDALA
RSGAVDVIVVDSVAALTPKAEIEGEIGDSHMGLAARMMSQAMRKLAGN
LKQSNTLLIFINQIRMKIGVMFGNPETTTGGN
ALKFYASVRLDIRRIGAVKEGENVVGSETRVKVVKNKIAAPFKQAEFQIL
YGEGINFYGELVDLGVKEKLIEKAGAWYSY
KGEKIGQGKANATAWLKDNPETAKEIEKKVRELLLSNPNSTPDFSVDD
SEGVAETNEDF
(5)
3. Identify distantly related nucleotide sequences using tblastx
>dnaX DNA polymerase III subunit gamma;DNA polymerase III
subunit tau [ Escherichia coli str. K-12 substr. MG1655 ]
ATGAGTTATCAGGTCTTAGCCCGAAAATGGCGCCCACAAACCTTTGCTGACGTCGTCGG
CCAGGAACATGTGCTGACCGCACTGGCGAACGGCTTGTCGTTAGGGCGTATTCATCATG
CTTATCTTTTTTCCGGCACCCGTGGCGTCGGAAAAACCTCTATCGCCCGACTGCTGGCG
AAGGGGCTAAACTGCGAAACCGGCATTACCGCGACGCCGTGCGGCGTGTGCGATAACTG
TCGTGAAATCGAGCAGGGGCGCTTTGTCGATCTGATTGAAATCGACGCCGCCTCGCGCA
CCAAAGTTGAAGATACCCGCGACCTGCTGGATAACGTCCAGTACGCTCCGGCGCGTGGT
CGTTTCAAAGTTTATCTGATCGACGAAGTGCATATGCTGTCGCGCCACAGCTTTAACGC
ACTGTTAAAAACCCTTGAAGAGCCGCCGGAGCACGTTAAGTTTCTGCTGGCGACGACCG
ATCCACAGAAATTGCCGGTGACGATTTTGTCACGCTGTCTGCAATTTCATCTCAAGGCG
CTGGATGTCGAGCAAATTCGCCATCAGCTTGAGCACATCCTCAACGAAGAACATATCGC
TCACGAGCCGCGGGCGCTGCAATTGCTGGCACGCGCCGCTGAAGGCAGCCTGCGAGATG
CCTTAAGTCTGACCGACCAGGCGATTGCCAGCGGTGACGGCCAGGTTTCAACCCAGGCG
GTCAGTGCGATGCTGGGTACGCTTGACGACGATCAGGCGCTGTCGCTGGTTGAAGCGAT
GGTCGAGGCCAACGGCGAGCGCGTAATGGCGCTGATTAATGAAGCCGCTGCCCGTGGTA
TCGAGTGGGAAGCGTTGCTGGTGGAAATGCTCGGCCTGTTGCATCGTATTGCGATGGTA
CAACTTTCGCCTGCTGCACTTGGCAACGACATGGCCGCCATCGAGCTGCGGATGCGTGA
ACTGGCGCGCACCATACCGCCGACGGATATTCAGCTTTACTATCAGACGCTGTTGATTG
GTCGCAAAGAATTACCGTATGCGCCGGACCGTCGCATGGGCGTTGAGATGACGCTGCTG
CGCGCGCTGGCATTCCATCCGCGTATGCCGCTGCCTGAGCCAGAAGTGCCACGACAGTC
CTTTGCACCCGTCGCGCCAACGGCAGTAATGACGCCAACCCAGGTGCCGCCGCAACCGC
AATCAGCGCCGCAGCAGGCACCGACTGTACCGCTCCCGGAAACCACCAGCCAGGTGCTG
GCGGCGCGCCAGCAGTTGCAGCGCGTGCAGGGAGCAACCAAAGCAAAAAAGAGTGAACC
GGCAGCCGCTACCCGCGCGCGGCCGGTGAATAACGCTGCGCTGGAAAGACTGGCTTCGG
TCACCGATCGCGTTCAGGCGCGTCCGGTGCCATCGGCGCTGGAAAAAGCGCCAGCCAAA
AAAGAAGCGTATCGCTGGAAGGCGACCACTCCGGTGATGCAGCAAAAAGAAGTGGTCGC
CACGCCGAAGGCGCTGAAAAAAGCGCTGGAACATGAAAAAACGCCGGAACTGGCGGCGA
AGCTAGCGGCAGAAGCCATTGAGCGCGACCCGTGGGCGGCACAGGTGAGCCAACTTTCG
CTACCAAAACTGGTCGAACAGGTGGCGTTAAATGCCTGGAAAGAGGAGAGCGACAACGC
AGTATGTCTGCATTTGCGCTCCTCTCAGCGGCATTTGAACAACCGCGGTGCACAGCAAA
AACTGGCTGAAGCGTTGAGCATGTTAAAAGGTTCAACGGTTGAACTGACTATCGTTGAA
GATGATAATCCCGCGGTGCGTACGCCGCTGGAGTGGCGTCAGGCGATATACGAAGAAAA
ACTTGCGCAGGCGCGCGAGTCCATTATTGCGGATAATAATATTCAGACCCTGCGTCGGT
TCTTCGATGCGGAGCTGGATGAAGAAAGTATCCGCCCCATTTGA
a. Open the NCBI Blast suite (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
b. Select blastn program
c. Copy-paste the above sequence
d. Restrict your search to ‘Homo sapiens’
e. Did you find any significant results? Why?
No significant similarity found.
This occurred because our query sequence did not match any of the
sequences present in the database with the current search
parameters.
f. Now select tblastx program instead of blastn from BLAST suite
g. Use Genetic code: Bacteria and Archaea (11)
h. Repeat the exercise
i. Which human gene is homologous to dnaX of Escherichia coli?
Replication factor C subunit 5 isoform 4
(5)
4. Identify distantly related protein sequences using PSI-BLAST
a. Open the NCBI Blast suite (https://blast.ncbi.nlm.nih.gov/Blast.cgi)
b. Select ‘Protein BLAST’ or blastp program
c. Paste the accession number for Human PCNA protein ‘NP_002583’ in the
query box
d. Restrict your search to ‘Escherichia coli’
e. Optimize the program for ‘blastp (protein-protein BLAST)’
f. Click on the ‘BLAST’ button
g. Did you find any significant results? Why?
h.
NO we didn’t find any significant results. This is because our query
sequence did not match any sequence in the database with the
current search parameters.
i. Repeat the exercise but change the algorithm from ‘blastp (protein-protein
BLAST)’ to ‘PSI-BLAST (Position-Specific Iterated BLAST)’
j. Which E. coli protein is homologous to human PCNA protein?
(5)