Installation

tomoCPT (tomogram Centroid Prediction Tool) is a deep learning based program for enabling centroid prediction of objects in 3D cryo-tomograms.

Installation

Clone the repository in a user writable location

git clone https://github.com/shahpnmlab/tomocpt

cd tomocpt
Create a virtual environment to install tomocpt into

conda create -n tomocpt python=3.10
conda activate tomocpt
pip install -e .

Check if things are working by running

tomocpt --help

You should see the following output

  Usage: tomocpt [OPTIONS] COMMAND [ARGS]...                                                                                                                                                
                                                                                                                                                                                           
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                                                                                             
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ init           Function to create a template config file for running tomoCPT, only including annotated fields                                                                           
│ prepare_data   Prepares particle picking datasets by processing multiple tomograms and their corresponding coordinate files.                                                            
│ train          Trains a deep learning model for particle picking or self-supervised learning using PyTorch Lightning.                                                                   
│ predict        Performs parallel inference on tomogram data for particle detection and coordinate extraction.                                                                           
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Usage

Outline of tomocpt workflow

Usage instructions

TomoCPT (Tomogram Centroid Prediction Tool) is a deep learning-based solution for detecting and localizing particles in 3D cryo-electron tomograms. This guide walks you through the complete workflow from setup to prediction.

1. Initialize a Configuration File

The first step is to create a configuration file that defines parameters for data preparation, training, and inference.

Command-line approach:

# Create a default configuration file in the current directory
tomocpt init --output-path ./my_config.yaml

Expected output:

A my_config.yaml file will be created. Edit this file to set the paths and parameters according to your needs before proceeding. If working with multiple datasets, use comma separated values.

A sample with updated training parameters:

# my_config.yaml (sample content)
prepData:
  raw_data_dir: ???
  training_data_dir: ???
  particle_length_ang: ???
  coordinate_files: ???
  # ... other prepData fields
train:
  chunks_dir: ???      # Directory with preprocessed data chunks (REQUIRED)
  model_dir: ???       # Directory to save model weights and logs (REQUIRED)
  resume_from: null    # Optional: Path to a checkpoint to RESUME a run
  fine_tune_from: null # Optional: Path to a checkpoint to FINE-TUNE from
  n_epochs: 10
  batch_size: 16
  # ... other training fields
infer:
  tomogram_dir: ???
  predictions_dir: ???
  weights: ???
  # ... other inference fields

2. Generate Volume-Label Pairs

This step processes your raw tomograms and coordinate files into a format suitable for training.

Config file approach:

# After filling out the prepData section in my_config.yaml
tomocpt prepare_data --config-file my_config.yaml

Combined approach:

# Use config file but override specific parameters
tomocpt prepare_data --config-file my_config.yaml --particle-length-ang 200

3. Train a Model

This section covers the main training workflows. All parameters can be set either via the command line or in your .yaml configuration file.

Scenario 1: Training from Scratch

This is the most basic scenario, where you train a new model from randomly initialized weights.

# Using a configuration file is recommended
tomocpt train --config-file my_config.yaml

Scenario 2: Resuming an Interrupted Run

If your training run is aborted, you can resume it seamlessly from the last saved checkpoint. This restores the model, optimizer, and learning rate scheduler to their exact previous states.

# The command is the same as your original run, just add --resume-from
tomocpt train --config-file my_config.yaml --resume-from /path/to/model_dir/weights.ckpt

Alternatively, set resume_from: /path/to/model_dir/weights.ckpt in your my_config.yaml.

Scenario 3: Fine-Tuning from a Pre-trained Model

Use this to adapt a trained model to a new dataset. This loads the weights but resets the optimizer. It also automatically activates differential learning rates, using a much smaller learning rate for the model's core to avoid catastrophic forgetting.

You can download a pre-trained model from here.

# Use --fine-tune-from to load weights and start a new training run
tomocpt train --config-file my_config.yaml --fine-tune-from /path/to/previous_model/weights.ckpt

Alternatively, set fine_tune_from: /path/to/previous_model/weights.ckpt in your my_config.yaml.

Scenario 4: Fine-Tuning with Knowledge Distillation

This is an advanced fine-tuning technique where a "student" model learns from a pre-trained "teacher" model.

tomocpt train \
  --config-file my_config.yaml \
  --fine-tune-from /path/to/teacher/weights.ckpt \
  --use-distillation True \
  --distill-weight 0.5

--fine-tune-from: Path to the teacher model's weights.
--use-distillation: Must be set to True.
--distill-weight: Balances learning from new labels vs. learning from the teacher.

Hydra-style Parameter Overrides

You can always override any parameter from your config file directly on the command line using dot notation:

# Override learning rate and precision
tomocpt train --config-file my_config.yaml optimizer.lr=0.0001 network.TORCH_FLOAT_PRECISION=16

4. Run Prediction

Finally, apply your trained model to new tomograms for particle detection.

Config file approach:

# After filling out the infer section in my_config.yaml
tomocpt predict --config-file my_config.yaml

The resulting star file contains particle coordinates, ready for subsequent processing in tools like Relion.

Performance Tips

GPU Acceleration: Enable CUDA for both training and inference for best performance.
Batch Size: Adjust the batch size based on your GPU memory.
Precision: Using lower precision (e.g., network.TORCH_FLOAT_PRECISION=16) can accelerate training.
Gradient Accumulation: If you have limited GPU memory, use gradient_accumulation_steps in your config to train with a larger effective batch size.

Changelog

0.5.2

Added gradient accumulation for training on machines with lower memory.
Added a shared section in config.yaml to specify shared variables only once.

Development

tomoCPT is jointly developed by Ruben Sanchez-Garcia and Pranav NM Shah at the University of Oxford.

CITE

TomoCPT: a generalizable model for 3D particle detection and localization in cryo-electron tomograms

Shah PNM, Sanchez-Garcia R, Stuart DI. Acta Crystallographica Section D: Structural Biology, 81(2):63-76, 2025.

Name		Name	Last commit message	Last commit date
Latest commit History 175 Commits
assets		assets
externalConfExamples		externalConfExamples
tomocpt		tomocpt
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
conda_env.yaml		conda_env.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

tomoCPT (tomogram Centroid Prediction Tool) is a deep learning based program for enabling centroid prediction of objects in 3D cryo-tomograms.

Installation

Usage

Outline of tomocpt workflow

Usage instructions

1. Initialize a Configuration File

Command-line approach:

Expected output:

2. Generate Volume-Label Pairs

Config file approach:

Combined approach:

3. Train a Model

Scenario 1: Training from Scratch

Scenario 2: Resuming an Interrupted Run

Scenario 3: Fine-Tuning from a Pre-trained Model

Scenario 4: Fine-Tuning with Knowledge Distillation

Hydra-style Parameter Overrides

4. Run Prediction

Config file approach:

Performance Tips

Changelog

0.5.2

Development

CITE

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

shahpnmlab/tomocpt

Folders and files

Latest commit

History

Repository files navigation

tomoCPT (tomogram Centroid Prediction Tool) is a deep learning based program for enabling centroid prediction of objects in 3D cryo-tomograms.

Installation

Usage

Outline of tomocpt workflow

Usage instructions

1. Initialize a Configuration File

Command-line approach:

Expected output:

2. Generate Volume-Label Pairs

Config file approach:

Combined approach:

3. Train a Model

Scenario 1: Training from Scratch

Scenario 2: Resuming an Interrupted Run

Scenario 3: Fine-Tuning from a Pre-trained Model

Scenario 4: Fine-Tuning with Knowledge Distillation

Hydra-style Parameter Overrides

4. Run Prediction

Config file approach:

Performance Tips

Changelog

0.5.2

Development

CITE

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages