How to run AlphaFold2 on the Crop Diversity HPC.
AlphaFold2 is a deep learning system developed by DeepMind that predicts protein structures from amino acid sequences.
This repository contains instructions and scripts for running AlphaFold2 on the Crop Diversity HPC cluster.
The full databases and container are located in the shared database directory `/mnt/shared/datasets/databases/alphafold`.
The official AlphaFold database download script requires aria2c, which is not installed on the HPC and requires root access.
Instead, use download_db.sh script from the alphafold_non_docker repo and update with latest database locations from the official AlphaFold repo.
bash ./download_db.sh -d /mnt/shared/datasets/databases/alphafold/db -m full_dbsPull prebuilt AlphaFold 2.3.0 container.
apptainer pull docker://uvarc/alphafold:2.3.0
Submission scripts for the Monomer and Multimer presets can be downloaded from this repo or can be used directly from /mnt/shared/datasets/databases/alphafold.
To run the monomer prediction on the example protein:
sbatch alphafold_monomer_submit.sh \
/path/to/alphafold_cropdiv \
query.fasta \
/path/to/output/monomer_test_outThe output should resemble the 6MRR structure.
For predicting the structure of a single protein chain:
sbatch alphafold_monomer_submit.sh \
/path/containing/fasta_file \
single_protein.fasta \
/path/to/outputTo run the multimer prediction on the example proteins:
sbatch alphafold_multimer_submit.sh \
/path/to/alphafold_cropdiv \
multimer_query.fasta \
/path/to/output/multimer_test_outThe output should resemble the 1DGC structure.
For predicting the structure of protein complexes:
sbatch alphafold_multimer_submit.sh \
/path/containing/fasta_file \
two_protein.fasta \
/path/to/output- For monomer predictions, the input FASTA file should contain a single protein sequence.
- For multimer predictions, the input FASTA file should contain two protein sequences.
- NOTE: Ensure that any stop indicators (
*orX) are removed from the protein sequences before running.
AlphaFold requires significant computational resources:
- Monomer predictions: 4-8 CPU cores, 16GB RAM, 1 GPU (recommended)
- Multimer predictions: 4-8 CPU cores, 32GB RAM, 1-2 GPUs (recommended)
Common issues:
- Out of memory errors: Increase the memory allocation in the submission script
- Database connection failures: Verify the database path is correct
- GPU errors: Make sure to request available GPU resources
The --output_dir directory will have the following structure:
<target_name>/
features.pkl
ranked_{0,1,2,3,4}.pdb
ranking_debug.json
relax_metrics.json
relaxed_model_{1,2,3,4,5}.pdb
result_model_{1,2,3,4,5}.pkl
timings.json
unrelaxed_model_{1,2,3,4,5}.pdb
msas/
bfd_uniref_hits.a3m
mgnify_hits.sto
uniref90_hits.sto
Please see the official AlphaFold repo for a full description of all output files.
AlphaFold Analyser is a command line tool to produce high quality visualisations of protein structures predicted by AlphaFold2. These visualisations allow the user to view the pLDDT of each residue of a protein structure and the predicted alignment error for the entire protein to rapidly infer the quality of a predicted structure. Alphafold analyser can process the results of both multimer and monomer predictions.