This repository contains a suite of Large Language Models (LLMs) designed for high-throughput mining and generation of antimicrobial peptides (AMPs). Each model serves a unique purpose in the discovery and evaluation of potential AMPs:
- ProteoGPT: A pre-trained model for generating and analyzing amino acid sequences.
- AMPGenix: A sequence generator capable of producing potential AMP sequences.
- AMPSorter: A classifier designed to identify AMPs from peptide datasets.
- BioToxiPept: A classifier aimed at determining the cytotoxicity of short peptides.
All models and config files can be downloaded from here.
Clone this repository to your local machine using:
git clone https://github.com/W1V1995/AMP_Project.gitDownload the AMP_models and place it in the AMP_Project directory.
Navigate into the cloned directory:
cd AMP_ProjectEnsure that all dependencies are installed by following the installation instructions provided in the requirements.txt file or the dedicated installation guide.
To create a classifier by fine-tuning, execute the following command:
sh Fine-tuning_classifier.shTo create a generator by fine-tuning, execute the following command:
sh Fine-tuning_generator.shParameters such as batch_size, epochs, etc., and output path can be customized.
To generate sequences using AMPGenix, run:
sh AMPGenix.shParameters such as ntokens, nsamples, prefix, model_path, save_samples_path, etc., can be adjusted as per your requirements.
For identifying AMPs from a peptide dataset using AMPSorter, execute:
sh AMPSorter_predictor.shCustomize parameters including batch_size, raw_data_path, model_path, classifier_path, output_path, candidate_amp_path, etc., to fit your dataset and path.
To predict the cytotoxicity of short peptides with BioToxiPept, use:
sh BioToxiPept.shAdjustable parameters are batch_size, raw_data_path, model_path, classifier_path, output_path, candidate_pep_path, etc.
To predict the antimicrobial activity of short peptides based on charged residues and hydrophobic residues, use:
python QSAR.pyAdjustable parameters are samples_path, output_path.
To utilize AmpSorter or BioToxiPept for predictions, prepare a CSV file containing your sequence data. Ensure the file includes a column named "Sequence". Example format:
Sequence
<sequence_1>
<sequence_2>
...
Save your dataset in the format /Data/Sequence.csv or modify the script parameters to point to your custom data path.