tomoCPT (tomogram Centroid Prediction Tool) is a deep learning based program for enabling centroid prediction of objects in 3D cryo-tomograms.
- Clone the repository in a user writable location
git clone https://github.com/shahpnmlab/tomocpt
-
cd tomocpt -
Create a virtual environment to install tomocpt into
conda create -n tomocpt python=3.10
conda activate tomocpt
pip install -e .
- Check if things are working by running
tomocpt --help
You should see the following output
Usage: tomocpt [OPTIONS] COMMAND [ARGS]...
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help Show this message and exit.
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ init Function to create a template config file for running tomoCPT, only including annotated fields
│ prepare_data Prepares particle picking datasets by processing multiple tomograms and their corresponding coordinate files.
│ train Trains a deep learning model for particle picking or self-supervised learning using PyTorch Lightning.
│ predict Performs parallel inference on tomogram data for particle detection and coordinate extraction.
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
TomoCPT (Tomogram Centroid Prediction Tool) is a deep learning-based solution for detecting and localizing particles in 3D cryo-electron tomograms. This guide walks you through the complete workflow from setup to prediction.
The first step is to create a configuration file that defines parameters for data preparation, training, and inference.
# Create a default configuration file in the current directory
tomocpt init --output-path ./my_config.yamlA my_config.yaml file will be created. Edit this file to set the paths and parameters according to your needs before proceeding. If working with multiple datasets, use comma separated values.
A sample with updated training parameters:
# my_config.yaml (sample content)
prepData:
raw_data_dir: ???
training_data_dir: ???
particle_length_ang: ???
coordinate_files: ???
# ... other prepData fields
train:
chunks_dir: ??? # Directory with preprocessed data chunks (REQUIRED)
model_dir: ??? # Directory to save model weights and logs (REQUIRED)
resume_from: null # Optional: Path to a checkpoint to RESUME a run
fine_tune_from: null # Optional: Path to a checkpoint to FINE-TUNE from
n_epochs: 10
batch_size: 16
# ... other training fields
infer:
tomogram_dir: ???
predictions_dir: ???
weights: ???
# ... other inference fieldsThis step processes your raw tomograms and coordinate files into a format suitable for training.
# After filling out the prepData section in my_config.yaml
tomocpt prepare_data --config-file my_config.yaml# Use config file but override specific parameters
tomocpt prepare_data --config-file my_config.yaml --particle-length-ang 200This section covers the main training workflows. All parameters can be set either via the command line or in your .yaml configuration file.
This is the most basic scenario, where you train a new model from randomly initialized weights.
# Using a configuration file is recommended
tomocpt train --config-file my_config.yamlIf your training run is aborted, you can resume it seamlessly from the last saved checkpoint. This restores the model, optimizer, and learning rate scheduler to their exact previous states.
# The command is the same as your original run, just add --resume-from
tomocpt train --config-file my_config.yaml --resume-from /path/to/model_dir/weights.ckptAlternatively, set resume_from: /path/to/model_dir/weights.ckpt in your my_config.yaml.
Use this to adapt a trained model to a new dataset. This loads the weights but resets the optimizer. It also automatically activates differential learning rates, using a much smaller learning rate for the model's core to avoid catastrophic forgetting.
You can download a pre-trained model from here.
# Use --fine-tune-from to load weights and start a new training run
tomocpt train --config-file my_config.yaml --fine-tune-from /path/to/previous_model/weights.ckptAlternatively, set fine_tune_from: /path/to/previous_model/weights.ckpt in your my_config.yaml.
This is an advanced fine-tuning technique where a "student" model learns from a pre-trained "teacher" model.
tomocpt train \
--config-file my_config.yaml \
--fine-tune-from /path/to/teacher/weights.ckpt \
--use-distillation True \
--distill-weight 0.5--fine-tune-from: Path to the teacher model's weights.--use-distillation: Must be set toTrue.--distill-weight: Balances learning from new labels vs. learning from the teacher.
You can always override any parameter from your config file directly on the command line using dot notation:
# Override learning rate and precision
tomocpt train --config-file my_config.yaml optimizer.lr=0.0001 network.TORCH_FLOAT_PRECISION=16Finally, apply your trained model to new tomograms for particle detection.
# After filling out the infer section in my_config.yaml
tomocpt predict --config-file my_config.yamlThe resulting star file contains particle coordinates, ready for subsequent processing in tools like Relion.
- GPU Acceleration: Enable CUDA for both training and inference for best performance.
- Batch Size: Adjust the batch size based on your GPU memory.
- Precision: Using lower precision (e.g.,
network.TORCH_FLOAT_PRECISION=16) can accelerate training. - Gradient Accumulation: If you have limited GPU memory, use
gradient_accumulation_stepsin your config to train with a larger effective batch size.
- Added gradient accumulation for training on machines with lower memory.
- Added a
sharedsection inconfig.yamlto specify shared variables only once.
tomoCPT is jointly developed by Ruben Sanchez-Garcia and Pranav NM Shah at the University of Oxford.
TomoCPT: a generalizable model for 3D particle detection and localization in cryo-electron tomograms
Shah PNM, Sanchez-Garcia R, Stuart DI. Acta Crystallographica Section D: Structural Biology, 81(2):63-76, 2025.