Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views13 pages

Protein Databases 1

The document provides an overview of various protein databases, including PIR, SWISS-PROT, TrEMBL, and structure databases like PDB, SCOP, and CATH, which offer information on protein sequences, structures, and classifications. It also discusses protein pattern databases like InterPro and PROSITE, as well as metabolic pathway databases such as KEGG and Reactome, which are essential for understanding biochemical pathways. Additionally, it highlights protein-protein interaction databases like BIND, DIP, MINT, and STRING, emphasizing their applications in sequence analysis, protein structure prediction, and drug discovery.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views13 pages

Protein Databases 1

The document provides an overview of various protein databases, including PIR, SWISS-PROT, TrEMBL, and structure databases like PDB, SCOP, and CATH, which offer information on protein sequences, structures, and classifications. It also discusses protein pattern databases like InterPro and PROSITE, as well as metabolic pathway databases such as KEGG and Reactome, which are essential for understanding biochemical pathways. Additionally, it highlights protein-protein interaction databases like BIND, DIP, MINT, and STRING, emphasizing their applications in sequence analysis, protein structure prediction, and drug discovery.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Protein databases

PIR

• PIR (Protein Information Resource) is a


popular protein sequence database
that provides information on
functionally annotated protein
sequences.
• PIR maintains three databases, the
Protein Sequence Database (PSD), the
Non-redundant Reference (NREF)
sequence database, and the integrated
Protein Classification (iProClass)
database, which contains annotated
protein sequences, classification
information, and protein family,
function, and structure information.
SWISS-PROT

• SWISS-PROT is a protein sequence database that provides high


levels of annotations, including information on the protein’s
function, domain structure, post-translational modifications,
and variants.
• Swiss-Prot is jointly managed by the SIB (Swiss Institute of
Bioinformatics) and the EBI (European Bioinformatics
Institute).
• The database distinguishes itself from other protein sequence
databases by three criteria: (i) annotations, which cover a
broad range of information, (ii) minimal redundancy, which
ensures that each sequence is represented only once, and (iii)
integration with other databases, which enables cross-
referencing and retrieval of information from related databases.
TrEMBL
• TrEMBL is a computer-annotated supplement of Swiss-
Prot. TrEMBL entries follow the Swiss-Prot format.
• It contains all the translations of EMBL (European
Molecular Biology Laboratory) nucleotide sequence
entries that have not yet been integrated into Swiss-
Prot.
Protein Structure Databases

Protein structure databases are collections of information


related to the three-dimensional structure and secondary
structure of proteins.
There are several examples of protein structure databases.
Some are:
PDB
• PDB (Protein Data Bank) is a worldwide repository of 3D
structure data on large molecules such as proteins, nucleic
acids, and other biological macromolecules.
• It stores three-dimensional structural models of
macromolecules obtained through three frequently used
experimental methods: X-ray crystallography, nuclear
magnetic resonance spectroscopy (NMR), and electron
microscopy (3DEM).
SCOP
• SCOP (Structural Classification of Proteins) is a protein structure database
that organizes proteins based on their secondary structure properties.
• SCOP categorizes proteins into different levels based on their evolutionary
relationships and structural similarities.
• Proteins with high sequence identity or similar structure and function are
grouped into families, and families with similar structures but low
sequence identity are placed into superfamilies.
• Proteins with the same major secondary structures in the same
arrangement are placed into the same fold category, and folds are further
grouped into five structural classes(all-alpha, all-beta, alpha/beta,
alpha+beta, and multi-domain).
• In addition to these five main classes, SCOP also includes other categories
like small proteins, membrane and cell surface proteins, coiled proteins,
and those with low-resolution structures
CATH
• CATH is a database that categorizes protein domains
into hierarchical levels based on their folding patterns.
• Protein domains are classified into the CATH
hierarchy, which consists of four levels of increasing
specificity: Class, Architecture, Topology, and
Homologous Superfamily. Domains that have similar
folding patterns are grouped together at higher levels of
the hierarchy.
Protein Pattern and Profile Databases

Protein pattern and profile databases contain information on motifs found


in sequences. Sequence motifs correspond to structural or functional
features in proteins. So, the use of protein sequence patterns or profiles
is a valuable tool in determining the function of proteins.
InterPro
• InterPro is a database that contains information on protein families,
domains, and functional sites.
• It was created by combining several major protein signature databases,
including PROSITE, Pfam, PRINTS, ProDom, and SMART into a single
comprehensive resource.
PROSITE
• PROSITE is a collection of signatures that identify patterns or profiles in
proteins, which can provide information on their biological functions.
• The signatures in the database are linked to annotation documents that
provide information on the protein family or domain detected, including
Metabolic Pathway Databases
Metabolic pathway databases contain information about enzymes,
biochemical reactions, and metabolic pathways.
ENZYME
• ENZYME is a database that stores information on enzyme
nomenclature.
• It is used as the nomenclature source for enzyme names and
reactions by most metabolic databases as well as by other
biomolecular databases.
KEGG
• KEGG (Kyoto Encyclopedia of Genes and Genomes) is a
comprehensive database that maps out molecular and cellular
pathways involving interactions between genes and molecules.
• It is composed of pathway maps, molecule tables, gene tables, and
genome maps, and is used to build functional maps of metabolic and
regulatory pathways.
Reactome:
• Reactome is an open source, expert-curated and peer-reviewed database
of biological reactions and pathways with cross-references to major
molecular databases. Reactome provides the visual representation of
classical intermediary metabolism, signaling, innate and acquired immune
function, transcriptional regulation, apoptosis and disease process etc.
Reactome website supports the navigation of pathway knowledge and
pathway-based analysis and visualization of experimental or
computational data. Interaction, reaction and pathway data are
downloadable as flat file. They are also accessible through RESTful web
services. Software tools such as Pathway Browser, Analyze Data, Species
Comparison, and Reactome FI Network are provided to support data
mining and analysis of large-scale data sets. The Reactome release in
September 2015 contains 101,670 proteins, 74,357 complexes, 68,659
reactions, and 20,261 pathways.
Protein-Protein Interaction Databases
Protein-protein interaction databases are collections of information on the interactions between
proteins. These databases provide valuable information on the relationships between different
proteins and their functions in biological systems.
Examples of protein-protein interaction databases include:
BIND
• BIND (Biomolecular Interaction Network Database) is a database that stores detailed descriptions
of interactions, molecular complexes, and pathways between various biomolecules, including
proteins, nucleic acids, and small molecules.
• The database is designed to be used for data mining and can be used to study networks of
interactions and map pathways across different species. The database can also provide
information for kinetic simulations.
DIP
• DIP (Database of Interacting Proteins) is a database that contains protein-protein interaction
information that has been compiled through both manual curations and computational methods.
• It is useful for understanding protein functions and their relationships with other proteins. It can
also be used to study the properties of networks of interacting proteins, evaluate predictions of
protein-protein interactions, and explore the evolution of these interactions.
MINT
• MINT (Molecular Interaction) is a database that stores information on functional interactions
between biological molecules such as proteins, RNA, and DNA.
• It also stores information on enzymatic modifications of partner molecules.
• The database primarily focuses on experimentally verified protein-protein interactions and
STRING:

In molecular biology, STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is a biological
database and web resource of known and predicted protein–protein interactions.

The STRING database contains information from numerous sources, including experimental data,
computational prediction methods, and public text collections. It is freely accessible and it is regularly
updated. The resource also serves to highlight functional enrichments in user-provided lists of proteins,
using a number of functional classification systems such as GO, Pfams, and KEGG.
Applications of protein databases
Protein databases have numerous applications. Some of the
applications are:
• Protein databases can be used in sequence analysis to identify
homologous sequences and predict protein functions based on
sequence similarity.
• Protein databases can also be used for predicting protein
structure by comparing the amino acid sequence of a protein
with known structures in the database.
• Protein databases also include tools to study protein-protein
interactions.
• Protein pattern and profile databases can be used for protein
family identification by identifying conserved motifs.
• Protein databases such as metabolic pathway databases can
be used in drug discovery and disease research by studying the
metabolic pathways involved in diseases.

You might also like