Spatial Gene Expression Prediction from H&E Histology

This repository contains code and models for predicting spatial gene expression from H&E histology images. I combined insights from key research, performed robust preprocessing, and trained multi-scale deep learning models with stacking for optimal accuracy.

📅 Dataset available here: Kaggle Competition - EL Hackathon 2025

1. Search Studies

I reviewed the following papers to build background and inspire architectural decisions:

Benchmarking the translational potential of spatial gene expression prediction from histology

This paper reviews multiple models and preprocessing approaches for gene expression prediction. It helped me understand the common practices in stain normalization, spot realignment, and also the prevailing network structures.
DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H&E Images

DeepSpot introduced the critical concept of combining local and global structural context. This directly motivated my use of multi-scale input branches in the model.

2. Exploratory Data Analysis (EDA)

https://www.kaggle.com/code/tarundirector/histology-eda-spotnet-visual-spatial-dl I leveraged the identification of low-activity spatial spots to inform spot realignment.
https://www.kaggle.com/code/dalloliogm/eda-exploring-cell-type-abundance This notebook introduced the idea of smoothing rank values using neighboring spots. Though this approach wasn't successful in my case, it provided useful experimentation.

3. Data Preprocessing

3.1 Image Data

Stain normalization - Normalize histological color variation between images.
Background masking - Remove non-tissue regions to focus the model on relevant areas.

3.2 Spot Data

Spot realignment - Adjust spot positions to align with image coordinates.
Remove invalid data - Remove spots that fall outside of tissue regions, identified using a grayscale threshold-based tissue mask.
Expression ranking - Replace raw expression counts with rank values for each spot.

3.3 Final Preprocessing

Calculate spot distance - Compute average distances between spots.
Image tiling - Extract tiles around each spot for model input.

4. Model Training

Model: `VisionMLP_MultiTask`

This multi-branch model integrates both global and local features:

Tile Encoder: Deep encoder with residual blocks and multi-scale pooling.
Subtile Encoder: Uses several subtile patches and aggregates via mean pooling.
Center Subtile Encoder: Focuses on the central subtile using the same structure.

Each branch outputs features, which are concatenated and passed through a decoder MLP for expression prediction.

Meta-Model: `StackingMLP`

After training 6 individual models using Leave-One-Out Cross-Validation on the 6 training images (S_1 to S_6), I ensemble the predictions using a meta model:

Input: Concatenated predictions from base models
Model: MLP with BatchNorm, Dropout, and LeakyReLU
Output: Final expression value predictions

5. Docker Support with JupyterLab

I provide both CPU versions of the Docker image. Each image includes all dependencies and automatically launches JupyterLab. The Docker image is used for the preprocessing steps. If you want to use GPU to train the model, you need to download the package locally!

📅 Pull Image (CPU version)

docker pull deweywang/spatialhackathon:latest

Run with Jupyter Notebook (Linux/macOS)

docker run -it --rm -p 8888:8888 -v "$PWD":/workspace \
  deweywang/spatialhackathon:latest \
  jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root

Run with Jupyter Notebook (Windows CMD)

docker run -it --rm -p 8888:8888 -v %cd%:/workspace \
  deweywang/spatialhackathon:latest \
  jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root

Jupyter will launch with no token and open to all local users for convenience in local setups.

Navigate to http://localhost:8888 in your browser.

⚠ Common Issues

Port already in use: Change -p 8888:8888 to -p 8889:8888 and open http://localhost:8889
Volume mounting fails on Windows: Use -v %cd%:/workspace in CMD, or $PWD in bash.

💻 Run Locally with GPU/MPS Support

✅ All installations are isolated within the .venv virtual environment and will not affect your global Python environment or system-wide packages.

Step-by-Step Local Setup (macOS/Linux/Windows)

make init                  # Step 1. Set up virtual environment and install dependencies
source .venv/bin/activate  # Step 2 (macOS/Linux)
# .venv\Scripts\activate   # Step 2 (Windows)
make lab                   # Step 3. Launch JupyterLab

Inside JupyterLab

Click the kernel selector at the top-right of the notebook interface
Choose: Python (.venv) spatialhackathon

Extra Commands

make clean     # Remove the virtual environment and Jupyter kernel spec
make reset     # Clean everything and reinitialize from scratch

Tested on: Apple M1 Pro, macOS Sequoia 15.3.1 (24D70)

Feel free to open issues!

📄 License & Credits

This project is released for research and educational purposes only.

The original dataset is provided by the EL Hackathon 2025 and subject to its own license terms and usage restrictions.
The code and models in this repository are developed by Ding Yang Wang and shared under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
Key architectural inspirations are derived from published academic work cited above. Please acknowledge the original authors when appropriate.

For full terms, see: https://creativecommons.org/licenses/by-nc/4.0/

Citation & Contribution

If you find this project helpful in your research or work, please consider citing it:

📖 Citation

@misc{wang2025hevisum,
  author       = {Ding Yang Wang},
  title        = {HEVisum: Spatial Gene Expression Prediction from H\&E Histology},
  year         = {2025},
  publisher    = {GitHub},
  journal      = {GitHub repository},
  howpublished = {\url{https://github.com/Dewey-Wang/HEVisum}}
}

or simply use this format:

Wang, D. Y. (2025). HEVisum: Spatial Gene Expression Prediction from H&E Histology. GitHub repository: https://github.com/Dewey-Wang/HEVisum

👍 Contribution

While this project was completed entirely by myself, I welcome feedback or discussions.

If you'd like to adapt this work:

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a pull request

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
data preprocessing		data preprocessing
images		images
model training		model training
output_folder		output_folder
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
environment.yml		environment.yml
install_host.sh		install_host.sh
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Spatial Gene Expression Prediction from H&E Histology

1. Search Studies

2. Exploratory Data Analysis (EDA)

3. Data Preprocessing

3.1 Image Data

3.2 Spot Data

3.3 Final Preprocessing

4. Model Training

Model: `VisionMLP_MultiTask`

Meta-Model: `StackingMLP`

5. Docker Support with JupyterLab

📅 Pull Image (CPU version)

Run with Jupyter Notebook (Linux/macOS)

Run with Jupyter Notebook (Windows CMD)

⚠ Common Issues

💻 Run Locally with GPU/MPS Support

Step-by-Step Local Setup (macOS/Linux/Windows)

Inside JupyterLab

Extra Commands

📄 License & Credits

Citation & Contribution

📖 Citation

👍 Contribution

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Dewey-Wang/HEVisum

Folders and files

Latest commit

History

Repository files navigation

Spatial Gene Expression Prediction from H&E Histology

1. Search Studies

2. Exploratory Data Analysis (EDA)

3. Data Preprocessing

3.1 Image Data

3.2 Spot Data

3.3 Final Preprocessing

4. Model Training

Model: VisionMLP_MultiTask

Meta-Model: StackingMLP

5. Docker Support with JupyterLab

📅 Pull Image (CPU version)

Run with Jupyter Notebook (Linux/macOS)

Run with Jupyter Notebook (Windows CMD)

⚠ Common Issues

💻 Run Locally with GPU/MPS Support

Step-by-Step Local Setup (macOS/Linux/Windows)

Inside JupyterLab

Extra Commands

📄 License & Credits

Citation & Contribution

📖 Citation

👍 Contribution

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Model: `VisionMLP_MultiTask`

Meta-Model: `StackingMLP`

Packages