Post Drug Repurposing Analysis (PDRA)

This repository focuses on a comprehensive Post Drug Repurposing Analysis workflow. The project aims to evaluate candidate drugs identified through repurposing efforts by analyzing their Absorption, Distribution, Metabolism, and Excretion (ADME) profiles, predicting toxicities using the Tox21 dataset, and assessing molecular activity through Quantitative Structure-Activity Relationship (QSAR) modeling. This streamlined approach is designed to accelerate drug discovery while ensuring both safety and efficacy.

Project Overview

Drug repurposing is an efficient strategy to identify new therapeutic uses for existing drugs. This project provides a robust framework for the post-analysis of drug repurposing candidates. It encompasses crucial stages such as ADME screening, toxicity prediction, and QSAR modeling, offering a holistic evaluation of drug candidates. The ultimate goal is to generate actionable insights that can expedite the drug discovery pipeline.

Current Progress

At this preliminary stage, the project focuses on preparing molecular data for subsequent pharmacokinetic and toxicity evaluations. This includes:

Filtering initial drug repurposing results based on specific criteria.
Retrieving essential chemical identifiers (CID) and structural representations (SMILES) from the PubChem database.
Formatting data for compatibility with external ADME prediction tools like SwissADME.
Initial work on clustering of results and identification of significant drug groups using unsupervised learning methods has also been addressed.

The complete codes and explanations for the remaining, more advanced steps are currently under development and will be added soon.

Implemented Steps

Step 1: Filtering Drug Repurposing Results

This step processes raw drug repurposing data to refine the list of candidates based on predefined criteria.

Input: export.csv (a CSV file containing initial drug repurposing results, including Rank, Score, Type, ID, Name, Description).
Process:
- Reads the export.csv file.
- Filters rows based on Type (e.g., 'cp', 'kd', 'oe', 'cc') and Score values greater than 90.
Output: Four distinct CSV files, each containing filtered data for a specific drug type with scores exceeding 90:
- filtered_cp_above_90.csv
- filtered_kd_above_90.csv
- filtered_oe_above_90.csv
- filtered_cc_above_90.csv

Step 2: Fetching CID and SMILES for SwissADME

This step integrates with the PubChem API to enrich the filtered drug candidates with chemical identifiers and structural information, crucial for ADME analysis.

Input: filtered_cp_above_90.csv (or any of the filtered files from Step 1).
Process:
- Reads the filtered data and extracts drug names.
- Utilizes the PubChem PUG REST API to fetch the Compound ID (CID) for each drug name.
- Uses the obtained CID to retrieve the SMILES (Simplified Molecular-Input Line-Entry System) notation, a standard chemical structure representation.
- Includes a time.sleep delay between API requests to prevent rate limiting.
Output:
- compounds.csv: A CSV file listing the Compound Name, CID, and SMILES Notation for compounds successfully processed.
- molecules_for_adme.txt: A text file containing SMILES notations, formatted specifically for direct input into SwissADME (SMILES followed by compound name on each line).

Future Enhancements

The upcoming phases of this project will include:

Pharmacokinetic Evaluation: Detailed ADME screening using SwissADME results (or similar tools) to assess absorption, distribution, metabolism, and excretion profiles.
Toxicity Assessment: Prediction of toxicities utilizing the Tox21 dataset and relevant models.
Clustering Analysis: Application of unsupervised learning methods such as KMeans and Hierarchical Clustering to identify significant drug groups and patterns within the repurposing results.
Quantitative Structure-Activity Relationship (QSAR) Modeling: Implementation of QSAR models using various machine learning techniques including:
- Random Forest
- Logistic Regression
- Support Vector Machine (SVM)
- Gradient Boosting
- Comparison of results with Deep Neural Network (DNN) approaches.

Stay tuned for these comprehensive updates!

Technology Stack

Python
Jupyter Notebook
Pandas: For data manipulation and CSV file processing.
Requests: For making HTTP requests to external APIs (e.g., PubChem).
CSV: For handling CSV file writes.
Time: For managing API request rates.
(Future) Scikit-learn: For various machine learning algorithms (QSAR, clustering, etc.).
(Future) Matplotlib & Seaborn: For data visualization.
(Future) TensorFlow/Keras or PyTorch: For Deep Neural Network implementations.

Example Outputs

Below are examples of the intermediate and final files generated by the current script:

export.csv (Input Example):

Rank,Score,Type,ID,Name,Description
1,99.98,oe,ccsbBroad304_01966,RUVBL1,ATPases / AAA-type
2,99.98,kd,CGS001-8848,TSC22D1,-
3,99.96,oe,ccsbBroad304_00841,IKBKB,IKK family
4,99.94,kd,CGS001-1196,CLK2,CDC-like kinases
5,99.88,cp,BRD-A02333338,cyclopamine,Smoothened receptor antagonist
...

filtered_cp_above_90.csv (Example Output from Step 1):

Rank,Score,Type,ID,Name,Description
5,99.88,cp,BRD-A02333338,cyclopamine,Smoothened receptor antagonist
20,99.25,cp,BRD-K90543092,levonorgestrel,Estrogen receptor agonist
21,99.21,cp,BRD-K59456551,methotrexate,Dihydrofolate reductase inhibitor
...

compounds.csv (Example Output from Step 2):

Compound Name,CID,SMILES Notation
cyclopamine,442972,CC1CC2C(C(C3(O2)CCC4C5CC=C6CC(CCC6(C5CC4=C3C)C)O)C)NC1
levonorgestrel,13109,CCC12CCC3C(C1CCC2(C#C)O)CCC4=CC(=O)CCC34
methotrexate,126941,CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C(=O)NC(CCC(=O)O)C(=O)O
...

molecules_for_adme.txt (Example Output from Step 2):

CC1CC2C(C(C3(O2)CCC4C5CC=C6CC(CCC6(C5CC4=C3C)C)O)C)NC1 cyclopamine
CCC12CCC3C(C1CCC2(C#C)O)CCC4=CC(=O)CCC34 levonorgestrel
CN(CC1=CN=C2C(=N1)C(=NC(=N2)N)N)C3=CC=C(C=C3)C(=O)NC(CCC(=O)O)C(=O)O methotrexate
...

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Drug Repurposing Data Processing.ipynb		Drug Repurposing Data Processing.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Post Drug Repurposing Analysis (PDRA)

Table of Contents

Project Overview

Current Progress

Implemented Steps

Step 1: Filtering Drug Repurposing Results

Step 2: Fetching CID and SMILES for SwissADME

Future Enhancements

Technology Stack

Example Outputs

About

Uh oh!

Releases

Packages

Languages

hossein-noorollahi/PDRA

Folders and files

Latest commit

History

Repository files navigation

Post Drug Repurposing Analysis (PDRA)

Table of Contents

Project Overview

Current Progress

Implemented Steps

Step 1: Filtering Drug Repurposing Results

Step 2: Fetching CID and SMILES for SwissADME

Future Enhancements

Technology Stack

Example Outputs

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages