PACCS

This is the code repo for the paper Accurate and Rational Collision Cross Section Prediction using Voxel Projected Area and Deep Learning. We developed a Projected Area-based CCS prediction method (PACCS) directly from molecular conformers. PACCS supports users to generate large-scale and searchable CCS databases using the open-source Jupyter Notebook.

Package required:

We recommend to use conda and pip.

By using the requirements/conda/environment.yml or requirements/pip/requirements.txt file, it will install all the required packages.

git clone https://github.com/yuxuanliao/PACCS.git
cd PACCS
conda env create -f requirements/conda/environment.yml
conda activate PACCS

Data pre-processing

PACCS calculates the projected area with the voxel-based approach, computes the m/z, and constructs the molecular graph. The related method is shown in VoxelProjectedArea.py, MZ.py, and MolecularRepresentations.py.

1. Generate 3D conformers of molecules.

mol = Chem.MolFromSmiles(smiles)
mol = Chem.AddHs(mol)
ps = AllChem.ETKDGv3()
ps.randomSeed = -1
ps.maxAttempts = 1
ps.numThreads = 0
ps.useRandomCoords = True
re = AllChem.EmbedMultipleConfs(mol, numConfs = 1, params = ps)
re = AllChem.MMFFOptimizeMoleculeConfs(mol, numThreads = 0)

ETKDGv3 returns an EmbedParameters object for the ETKDG method - version 3 (macrocycles).
EmbedMultipleConfs generates the 3D conformers of molecules.
MMFFOptimizeMoleculeConfs optimizes the 3D conformers of molecules.

2. Calculate voxel projected area. For details, see VoxelProjectedArea.py.

Using the Fibonacci grids approach to distribute points evenly over the surfaces of 3D atomic spheres.
Projected on three coordinate planes (xy, xz, yz).
Averaging.

Model training

Train the model based on your own training dataset with Training.py function.

PACCS_train(input_path, epochs, batchsize, output_model_path)

Optionnal args

input_path : File path for storing the data of SMILES and adduct.
Hyperparameters : optimized hyperparameters (epochs, batchsize).
output_model_path : File path where the model is stored.

Predicting CCS

The predicted CCS values of molecules are obtained by feeding the voxel projected area, molecular graph, one-hot encoding of adduct type, and m/z into the already trained PACCS model with Prediction.py.

PACCS_predict(input_path, model_path, output_path)

Optionnal args

input_path : File path for storing the data of SMILES and adduct.
model_path : File path where the model is stored.
output_path : Path to save predicted CCS values.

Data

The curated dataset is randomly split into the training, validation, and test sets in a ratio of 8:1:1.
The external test set is used to compare the performance of different methods.

Usage

The example code for model training is included in the Model training.ipynb. By directly running Model training.ipynb, users can train the model based on your own training dataset.

The example code for CCS prediction is included in the CCS prediction.ipynb. By directly running CCS prediction.ipynb, users can use PACCS to predict CCS values.

The CCS values of molecules can be predicted via the colab link prediction.ipynb, which supports users in predicting CCS values directly, without downloading.

The example code for generating large-scale CCS databases by PACCS is included in the database generation.ipynb. By directly running database generation.ipynb, users can easily customize and generate their large-scale CCS databases by PACCS based on their practical needs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

PACCS

Package required:

Data pre-processing

Model training

Predicting CCS

Data

Usage

Information of maintainers

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
PACCS		PACCS
data		data
database generation		database generation
model		model
requirements		requirements
CCS Prediction included voxel projected area calcultion.ipynb		CCS Prediction included voxel projected area calcultion.ipynb
CCS prediction.ipynb		CCS prediction.ipynb
LICENSE		LICENSE
Model training.ipynb		Model training.ipynb
README.md		README.md
Voxel projected area.png		Voxel projected area.png

Uh oh!

License

Uh oh!

yuxuanliao/PACCS

Folders and files

Latest commit

History

Repository files navigation

PACCS

Package required:

Data pre-processing

Model training

Predicting CCS

Data

Usage

Information of maintainers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages