Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Deep learning implementation of quantmap method

pharmbio/dl_quantmap

Repository files navigation

Here, we present a method to predict the clusters that new chemicals belong to based on network topology. The network topology analyses from "Assessing relative bioactivity of chemical substances using quantitative molecular network topology analysis" and "Automated QuantMap for rapid quantitative molecular network topology analysis" were extended using deep learning to enable the assignment of new/unknown chemicals to predefined clusters.

Steps



Setting up the environment

To create and activate the environment.

conda env create -f environment.yaml
conda activate qmpred

To export the conda environment to jupyter notebook.

python -m ipykernel install --user --name=qmpred


Data preprocessing

The data collection and preprocessing of the data are done using .ipynb files in preprocessing_scripts.

  1. 1_create_stitch_string_sql_db.ipynb
    • Download and convert the necessary data from STITCH and STRING to an sql database.

  2. 2_data_generation.ipynb
    • Quantmap is run using the interaction data from the databases. The data are then assigned to clusters based on their similarity using K-Mean clustering based on a range of distance parameters.

  3. 3_data_preprocessing.ipynb
    • From all the clusters obtained from the above step those clusters with low support are rejected.

  4. 4_get_protein_function_of_clusters.ipynb
    • For the clusters selected above chemical-protein information from STITCH is used to determine the main functions of proteins in each cluster.

  5. 5_data_splits.ipynb
    • Split the dataset for cross validation and final training of the model.


Evaluation

Initially different architectures were evaluated using cross validation based on a subset of data. The architectures explored are present in the directory cross_validation. The parameters for the architectures can be passed using their respective json file (the parameters given here are the default values).


Training

For the final training of the MolPMoFiT architecture, the entire dataset is used. The parameters can be passed using parameters.json file. In order to run the final training of the MolPMoFiT model, pretraining has to be first carried out using pretraining_molpmofit.ipynb. After the training of the final model it can be used to make predictions for new chemicals predict_new_chem.ipynb. The input for the prediction can be given in the text file "test_cids.txt" with CIDs as input.


Citation

Please cite:

Predicting protein network topology clusters from chemical structure using deep learning.
Akshai Parakkal Sreenivasan, Philip J Harrison, Wesley Schaal, Damian J Matuszewski, Kim Kultima, and Ola Spjuth.
Status: Published.

About

Deep learning implementation of quantmap method

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •