Predicting protein network topology clusters from chemical structure using deep learning

Here, we present a method to predict the clusters that new chemicals belong to based on network topology. The network topology analyses from "Assessing relative bioactivity of chemical substances using quantitative molecular network topology analysis" and "Automated QuantMap for rapid quantitative molecular network topology analysis" were extended using deep learning to enable the assignment of new/unknown chemicals to predefined clusters.

Steps

Setting up the environment

To create and activate the environment.

conda env create -f environment.yaml
conda activate qmpred

To export the conda environment to jupyter notebook.

python -m ipykernel install --user --name=qmpred

Data preprocessing

The data collection and preprocessing of the data are done using .ipynb files in preprocessing_scripts.

1_create_stitch_string_sql_db.ipynb
- Download and convert the necessary data from STITCH and STRING to an sql database.
2_data_generation.ipynb
- Quantmap is run using the interaction data from the databases. The data are then assigned to clusters based on their similarity using K-Mean clustering based on a range of distance parameters.
3_data_preprocessing.ipynb
- From all the clusters obtained from the above step those clusters with low support are rejected.
4_get_protein_function_of_clusters.ipynb
- For the clusters selected above chemical-protein information from STITCH is used to determine the main functions of proteins in each cluster.
5_data_splits.ipynb
- Split the dataset for cross validation and final training of the model.

Evaluation

Initially different architectures were evaluated using cross validation based on a subset of data. The architectures explored are present in the directory cross_validation. The parameters for the architectures can be passed using their respective json file (the parameters given here are the default values).

Training

For the final training of the MolPMoFiT architecture, the entire dataset is used. The parameters can be passed using parameters.json file. In order to run the final training of the MolPMoFiT model, pretraining has to be first carried out using pretraining_molpmofit.ipynb. After the training of the final model it can be used to make predictions for new chemicals predict_new_chem.ipynb. The input for the prediction can be given in the text file "test_cids.txt" with CIDs as input.

Citation

Please cite:

Predicting protein network topology clusters from chemical structure using deep learning.
Akshai Parakkal Sreenivasan, Philip J Harrison, Wesley Schaal, Damian J Matuszewski, Kim Kultima, and Ola Spjuth.
Status: Published.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
cross_validation		cross_validation
final_run		final_run
preprocessing_scripts		preprocessing_scripts
processed_data		processed_data
supp_scripts		supp_scripts
Readme.md		Readme.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Predicting protein network topology clusters from chemical structure using deep learning

Steps

Setting up the environment

Data preprocessing

Evaluation

Training

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

Uh oh!

Uh oh!

pharmbio/dl_quantmap

Folders and files

Latest commit

History

Repository files navigation

Predicting protein network topology clusters from chemical structure using deep learning

Steps

Setting up the environment

Data preprocessing

Evaluation

Training

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages