Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Accurate and Rational Collision Cross Section Prediction using Voxel Projected Area and Deep Learning

License

yuxuanliao/PACCS

Repository files navigation

PACCS

This is the code repo for the paper Accurate and Rational Collision Cross Section Prediction using Voxel Projected Area and Deep Learning. We developed a Projected Area-based CCS prediction method (PACCS) directly from molecular conformers. PACCS supports users to generate large-scale and searchable CCS databases using the open-source Jupyter Notebook.

Package required:

We recommend to use conda and pip.

By using the requirements/conda/environment.yml or requirements/pip/requirements.txt file, it will install all the required packages.

git clone https://github.com/yuxuanliao/PACCS.git
cd PACCS
conda env create -f requirements/conda/environment.yml
conda activate PACCS

Data pre-processing

PACCS calculates the projected area with the voxel-based approach, computes the m/z, and constructs the molecular graph. The related method is shown in VoxelProjectedArea.py, MZ.py, and MolecularRepresentations.py.

1. Generate 3D conformers of molecules.

mol = Chem.MolFromSmiles(smiles)
mol = Chem.AddHs(mol)
ps = AllChem.ETKDGv3()
ps.randomSeed = -1
ps.maxAttempts = 1
ps.numThreads = 0
ps.useRandomCoords = True
re = AllChem.EmbedMultipleConfs(mol, numConfs = 1, params = ps)
re = AllChem.MMFFOptimizeMoleculeConfs(mol, numThreads = 0)

2. Calculate voxel projected area. For details, see VoxelProjectedArea.py.

  • Using the Fibonacci grids approach to distribute points evenly over the surfaces of 3D atomic spheres.
  • Projected on three coordinate planes (xy, xz, yz).
  • Averaging.

Model training

Train the model based on your own training dataset with Training.py function.

PACCS_train(input_path, epochs, batchsize, output_model_path)

Optionnal args

  • input_path : File path for storing the data of SMILES and adduct.
  • Hyperparameters : optimized hyperparameters (epochs, batchsize).
  • output_model_path : File path where the model is stored.

Predicting CCS

The predicted CCS values of molecules are obtained by feeding the voxel projected area, molecular graph, one-hot encoding of adduct type, and m/z into the already trained PACCS model with Prediction.py.

PACCS_predict(input_path, model_path, output_path)

Optionnal args

  • input_path : File path for storing the data of SMILES and adduct.
  • model_path : File path where the model is stored.
  • output_path : Path to save predicted CCS values.

Data

Usage

The example code for model training is included in the Model training.ipynb. By directly running Model training.ipynb, users can train the model based on your own training dataset.

The example code for CCS prediction is included in the CCS prediction.ipynb. By directly running CCS prediction.ipynb, users can use PACCS to predict CCS values.

The CCS values of molecules can be predicted via the colab link prediction.ipynb, which supports users in predicting CCS values directly, without downloading.

The example code for generating large-scale CCS databases by PACCS is included in the database generation.ipynb. By directly running database generation.ipynb, users can easily customize and generate their large-scale CCS databases by PACCS based on their practical needs.

Information of maintainers

About

Accurate and Rational Collision Cross Section Prediction using Voxel Projected Area and Deep Learning

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •