A comprehensive Model Context Protocol server for UniProt and EBI Proteins APIs, built on Cloudflare Workers with advanced data staging capabilities using Durable Objects and SQLite.
This MCP server provides unified access to:
- UniProtKB: Search and retrieve protein sequence and functional information
- EBI Proteins API: Detailed protein features, variations, and structural data
🚀 Unified Interface: Single tool for searching UniProt and fetching detailed protein data
📊 Advanced Data Staging: Large datasets automatically staged in SQLite for complex queries
🔍 Smart Query Generation: Automatic suggestions for exploring staged data
📈 Intelligent Bypassing: Small datasets returned directly for efficiency
🏗️ Scalable Architecture: Built on Cloudflare Workers with Durable Objects
⚡ Rate Limit Aware: Intelligent handling of API rate limits
Advanced UniProtKB search with comprehensive filtering and pagination:
- Query: Complex search queries with UniProt syntax
- Formats: JSON, TSV, FASTA, XML
- Features: Sorting, facets, compression, isoforms
- Pagination: Up to 500 results per page with automatic staging for large datasets
{
"query": "organism_id:9606 AND reviewed:true",
"format": "json",
"fields": "accession,protein_name,gene_names,organism_name",
"size": 100,
"sort": "score desc",
"compressed": true
}
Bulk download tool for large datasets with automatic staging:
- Purpose: Stream large datasets efficiently
- Auto-staging: Always stages responses for SQL querying
- Compression: Built-in compression support
- Formats: JSON, TSV, FASTA, XML
{
"query": "organism_id:9606 AND reviewed:true",
"format": "fasta",
"compressed": true
}
Retrieve individual UniProtKB entries by accession:
- Direct Access: Get specific protein entries
- Multiple Formats: JSON, TSV, FASTA, XML
- Isoforms: Include protein isoforms
- Field Selection: Choose specific data fields
{
"accession": "P04637",
"format": "json",
"fields": "accession,protein_name,sequence,organism_name",
"include_isoforms": true
}
Map IDs between different database systems:
- Batch Processing: Up to 100,000 IDs per job
- Cross-Database: Map between UniProt, Ensembl, PDB, etc.
- Job-Based: Asynchronous processing with status tracking
- Filtering: Taxonomy-based filtering
{
"from_db": "Gene_Name",
"to_db": "UniProtKB",
"ids": ["TP53", "BRCA1", "BRCA2"],
"taxon_id": "9606"
}
Perform BLAST searches against UniProtKB:
- Programs: BLASTP, BLASTX, TBLASTN
- Databases: UniProtKB, UniRef90, UniRef50
- Parameters: E-value, matrix, hit limits
- Async Processing: Job-based with polling
{
"program": "blastp",
"sequence": "MTMDKSELVQKAKLAEQAERYDDMAAAMKAVTEQGHELSNEERNLLSVAYKNVVGARRSSWRVISSIEQKTERNEKKQQMGKEYREKIEAELQDICNDVLELLDKYLIPNATQPESKVFYLKMKGDYFRYLSEVASGDNKQTTVSNSQQAYQEAFEISKKEMQPTHPIRLGLALNFSVFYYEILNSPEKACSLAKTAFDEAIAELDTLNEESYKDSTLIMQLLRDNLTLWTSENQGDEGDAGEGEN",
"database": "uniprotkb",
"threshold": 0.001,
"hits": 50
}
Detailed protein information from EBI Proteins API:
- Rich Data: Sequence, functional annotations, isoforms
- Formats: JSON, XML
- Isoforms: Include protein variants
{
"accession": "P04637",
"format": "json",
"include_isoforms": true
}
Protein sequence features and annotations:
- Categories: Domains, sites, regions, PTMs
- Formats: JSON, XML, GFF
- Filtering: Specific feature categories
{
"accession": "P04637",
"categories": ["DOMAINS_AND_SITES", "PTM"],
"format": "json"
}
Protein sequence variations and disease variants:
- Sources: UniProt, large-scale studies
- Consequences: Missense, nonsense, synonymous
- Disease Filter: Disease-associated variants only
- Clinical Data: ClinVar, COSMIC integration
{
"accession": "P04637",
"sources": ["uniprot", "large_scale_studies"],
"consequences": ["missense", "nonsense"],
"disease_filter": true
}
Proteomics data from various studies:
- Studies: PeptideAtlas, MaxQB, ProteomicsDB
- Tissues: Brain, liver, heart, etc.
- Quantitative: Expression levels and modifications
{
"accession": "P04637",
"tissues": ["brain", "liver"],
"format": "json"
}
Genome coordinate mappings:
- Assemblies: GRCh38, GRCh37
- Coordinates: Protein to genomic position mapping
- Exon Structure: Gene structure information
{
"accession": "P04637",
"assembly": "GRCh38",
"format": "json"
}
Query, analyze, and manage staged datasets:
- Operations: Query, schema, cleanup, export
- SQL Interface: Full SQLite support
- Export: JSON, CSV, TSV formats
- Analytics: Built-in query suggestions
{
"operation": "query",
"data_access_id": "uniprot_1234567890_abc123",
"sql": "SELECT * FROM protein WHERE JSON_EXTRACT(data, '$.organism.scientificName') = 'Homo sapiens' LIMIT 10"
}
npm install
npm run dev
The server will be available at:
- MCP Endpoint:
http://localhost:8787/mcp
- SSE Endpoint:
http://localhost:8787/sse
{
"method": "tools/call",
"params": {
"name": "uniprot_query",
"arguments": {
"operation": "search",
"query": "organism_id:9606 AND reviewed:true",
"limit": 10
}
}
}
{
"method": "tools/call",
"params": {
"name": "uniprot_query",
"arguments": {
"operation": "protein_details",
"accession": "P04637"
}
}
}
{
"method": "tools/call",
"params": {
"name": "data_manager",
"arguments": {
"operation": "fetch_and_stage",
"accessions": "P04637,Q92793",
"fields": "accession,protein_name,gene_names,organism_name"
}
}
}
For large datasets, the server automatically stages data in SQLite tables within Durable Objects, enabling complex analytical queries:
Data is normalized into tables like:
proteins
: Core protein informationgene_names
: Gene names and synonymsfeatures
: Protein sequence featureskeywords
: Functional keywordsreferences
: Literature references
-- Query staged JSON using SQLite JSON1
SELECT
json_extract(data, '$.primaryAccession') as accession,
json_extract(data, '$.genes[0].geneName.value') as gene_name,
json_extract(data, '$.sequence.length') as length
FROM protein
WHERE json_extract(data, '$.organism.scientificName') = 'Homo sapiens'
LIMIT 10;
- Base URL:
https://rest.uniprot.org/uniprotkb/
- Rate Limits: IP-based, ~3 requests/second recommended
- Formats: JSON, TSV, FASTA, GFF, XML
- Base URL:
https://www.ebi.ac.uk/proteins/api/
- Rate Limits: ~10 requests/second per IP
- Authentication: None required for public data
- UniProtMCP: Main MCP agent implementing ToolContext interface
- ToolRegistry: Manages and registers all available tools
- JsonToSqlDO: Durable Object for data staging and SQL operations
- ChunkingEngine: Handles large dataset chunking for efficient processing
- DataInsertionEngine: Optimized bulk data insertion with conflict resolution
- SchemaInferenceEngine: Automatic schema discovery and documentation
- Request: Tool receives search/fetch request
- API Call: Fetches data from UniProt/Proteins APIs
- Parsing: Normalizes JSON responses into structured entities
- Staging Decision: Determines if staging is beneficial
- Storage: Creates optimized SQLite tables in Durable Objects
- Querying: Enables complex SQL analysis of staged data
npm run deploy
Ensure wrangler.jsonc
includes:
- Durable Object bindings for
UniProtMCP
andJsonToSqlDO
- Node.js compatibility flags
- Proper migration configuration
No API keys required - both UniProt and EBI Proteins APIs are open access.
You can connect to your remote MCP server from Claude Desktop using the mcp-remote proxy.
Update your Claude Desktop configuration:
{
"mcpServers": {
"uniprot": {
"command": "npx",
"args": [
"mcp-remote",
"http://localhost:8787/sse" // or your-uniprot-server.workers.dev/sse
]
}
}
}
See RATE_LIMITING.md for detailed information about:
- API-specific rate limits and best practices
- Intelligent request throttling and retry logic
- Monitoring and optimization strategies
- Bulk operation handling
- Search for cancer-related proteins:
{
"operation": "search",
"query": "keyword:Cancer AND organism_id:9606",
"limit": 100
}
- Stage for analysis:
{
"operation": "fetch_and_stage",
"accessions": "P04637,P53_HUMAN,BRCA1_HUMAN,BRCA2_HUMAN"
}
- Analyze with SQL:
SELECT
p.accession,
p.protein_name,
COUNT(f.feature_id) as feature_count,
GROUP_CONCAT(DISTINCT k.keyword) as keywords
FROM proteins p
LEFT JOIN features f ON p.accession = f.accession
LEFT JOIN keywords k ON p.accession = k.accession
WHERE k.keyword LIKE '%cancer%'
GROUP BY p.accession
ORDER BY feature_count DESC;
- Search protein family:
{
"operation": "search",
"query": "family:\"protein kinase\" AND reviewed:true",
"fields": "accession,protein_name,gene_names,ec"
}
- Get detailed features:
{
"operation": "protein_features",
"accession": "P06493",
"features": "DOMAIN,BINDING,ACT_SITE"
}
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with both APIs
- Submit a pull request
MIT License - see LICENSE file for details.