Semi-Supervised GNN for Drug Discovery

A PyTorch Lightning framework for molecular property prediction. This project implements a Mean Teacher semi-supervised learning approach using an Attentive GINE backbone, designed to improve performance on drug discovery datasets with sparse labels.

Quick Start

1. Installation

Clone the repo and set up the environment.

# Clone repository
git clone https://github.com/pablorocg/semi-supervised-gnn-drug-discovery
cd semi-supervised-gnn-drug-discovery

# Create environment (recommended)
conda create -n gnn_env python=3.11 -y
conda activate gnn_env

# Install dependencies
pip install -r requirements.txt

2. Configuration

The project needs to know where to save data and logs.

Create a .env file from the template:
```
cp .env.template .env
```

Open .env and set your absolute paths:

SOURCE_DATA_DIR=/abs/path/to/data    # Datasets will be downloaded here
CONFIGS_DIR=/abs/path/to/config      # Path to the 'config' folder in this repo
LOGS_DIR=/abs/path/to/logs           # Where to save training logs

Usage

Run experiments using the scripts in src/trainers/. You can override parameters (like dataset or model) directly from the command line.

Train Supervised Baseline

Standard training using only labeled data.

python -m src.trainers.baseline_trainer \
    dataset.init.name=SIDER \
    "dataset.init.splits=[0.67, 0.03, 0.1, 0.2]"\
    dataset.init.batch_size_train=16 \
    dataset.init.mu=5

Train Semi-Supervised (Mean Teacher)

Training using both labeled and unlabeled data. Ideal for low-data regimes.

python -m src.trainers.mean_teacher_trainer \
    dataset.init.name=SIDER \
    "dataset.init.splits=[0.35, 0.35, 0.1, 0.2]" \
    dataset.init.batch_size_train=32 \
    dataset.init.mu=1

Project Structure

config/: Hydra configuration files (datasets, models, training params).
src/data/: DataModules for MoleculeNet and OGB datasets.
src/models/: GNN implementations (GINE, Attentive Encoders).
src/lightning_modules/: PyTorch Lightning modules for Baseline and Mean Teacher logic.
src/trainers/: Entry points for training scripts.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
config		config
src		src
.env.template		.env.template
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_experiments.ipynb		run_experiments.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semi-Supervised GNN for Drug Discovery

Quick Start

1. Installation

2. Configuration

Usage

Train Supervised Baseline

Train Semi-Supervised (Mean Teacher)

Project Structure

About

Uh oh!

Contributors 2

Uh oh!

Languages

pablorocg/semi-supervised-gnn-drug-discovery

Folders and files

Latest commit

History

Repository files navigation

Semi-Supervised GNN for Drug Discovery

Quick Start

1. Installation

2. Configuration

Usage

Train Supervised Baseline

Train Semi-Supervised (Mean Teacher)

Project Structure

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors 2

Uh oh!

Languages