Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
57 views9 pages

BS20B015 Bioinfo3

This document summarizes information retrieved from various bioinformatics databases and tools regarding a protein sequence analysis assignment. It includes: 1) Amino acid sequences, functions, and transmembrane segments retrieved from UniProt. 2) Numbers of mouse protein sequences from UniProt that are manually annotated and mapped to PDB and STRING. 3) Statistics on sequence lengths from UniProt/SwissProt and UniProt/TrEMBL. 4) Shortest and longest sequences found in each database. 5) Number of transcription factor clusters and sequences from STRING.

Uploaded by

fathimabensha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views9 pages

BS20B015 Bioinfo3

This document summarizes information retrieved from various bioinformatics databases and tools regarding a protein sequence analysis assignment. It includes: 1) Amino acid sequences, functions, and transmembrane segments retrieved from UniProt. 2) Numbers of mouse protein sequences from UniProt that are manually annotated and mapped to PDB and STRING. 3) Statistics on sequence lengths from UniProt/SwissProt and UniProt/TrEMBL. 4) Shortest and longest sequences found in each database. 5) Number of transcription factor clusters and sequences from STRING.

Uploaded by

fathimabensha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

ASSIGNMENT 3

BT3040 - Bioinformatics

FATHIMA BENSHA M
BS20B015
19 February 2023

1. All of the information can be found on UniProt.


Amino Acid sequence

Function

Transmembrane segments
There are a total of 19 transmembrane segments as shown below.

4. 17,317 results are found for mouse protein sequences that are
manually annotated.

Out of these, no. of sequences associated with PDB are 2217.

5. 2099 sequences are mapped to the STRING database out of the 2217
sequences we received in the previous inquiry. The entry IDs were selected
and ID mapping was done from UniProtKB_AC-ID to STRING by using
Retrieve/ID Mapping.

6. (a)

• The no. of sequences with extremely small and extremely large


sequence lengths is found to be low.

• The average length of a sequence is 361 amino acids in UniProtKB/


Swiss-Prot.

• The average length of a sequence is 351 amino acids in UniProtKB/


TrEMBL.

(b) The shortest sequence is GWA_SEPOF (P83570) in UniProtKB/Swiss-


Prot. The longest sequence is TITIN_MOUSE (A2ASS6) in UniProtKB/Swiss-
Prot.

The shortest sequence is A0A0U1RQB9_HUMAN in UniProtKB/TrEMBL.


The longest sequence is A0A5A9P0L4_9TELE in UniProtKB/TrEMBL.

(c) UniProtKB/TrEMBL

UniProtKB/Swiss-Prot

2. 95 clusters are found for transcription factors with 50% sequence


identity. The size column is put into excel sheet to calculate the total number
of sequences which is 2059.

The FASTA sequence is given below.

3. For Homo sapiens, a total of 379,233 results are found out of which
223,648, 102,119 and 52,466 have identity cut o of 100%, 90% and 50%
respectively.

ff
9

You might also like