This repository contains code and models for predicting spatial gene expression from H&E histology images. I combined insights from key research, performed robust preprocessing, and trained multi-scale deep learning models with stacking for optimal accuracy.
π Dataset available here: Kaggle Competition - EL Hackathon 2025
I reviewed the following papers to build background and inspire architectural decisions:
-
Benchmarking the translational potential of spatial gene expression prediction from histology
This paper reviews multiple models and preprocessing approaches for gene expression prediction. It helped me understand the common practices in stain normalization, spot realignment, and also the prevailing network structures.
-
DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H&E Images
DeepSpot introduced the critical concept of combining local and global structural context. This directly motivated my use of multi-scale input branches in the model.
-
https://www.kaggle.com/code/tarundirector/histology-eda-spotnet-visual-spatial-dl I leveraged the identification of low-activity spatial spots to inform spot realignment.
-
https://www.kaggle.com/code/dalloliogm/eda-exploring-cell-type-abundance This notebook introduced the idea of smoothing rank values using neighboring spots. Though this approach wasn't successful in my case, it provided useful experimentation.
- Stain normalization - Normalize histological color variation between images.
- Background masking - Remove non-tissue regions to focus the model on relevant areas.
- Spot realignment - Adjust spot positions to align with image coordinates.
- Remove invalid data - Remove spots that fall outside of tissue regions, identified using a grayscale threshold-based tissue mask.
- Expression ranking - Replace raw expression counts with rank values for each spot.
- Calculate spot distance - Compute average distances between spots.
- Image tiling - Extract tiles around each spot for model input.
This multi-branch model integrates both global and local features:
- Tile Encoder: Deep encoder with residual blocks and multi-scale pooling.
- Subtile Encoder: Uses several subtile patches and aggregates via mean pooling.
- Center Subtile Encoder: Focuses on the central subtile using the same structure.
Each branch outputs features, which are concatenated and passed through a decoder MLP for expression prediction.
After training 6 individual models using Leave-One-Out Cross-Validation on the 6 training images (S_1 to S_6), I ensemble the predictions using a meta model:
- Input: Concatenated predictions from base models
- Model: MLP with BatchNorm, Dropout, and LeakyReLU
- Output: Final expression value predictions
I provide both CPU versions of the Docker image. Each image includes all dependencies and automatically launches JupyterLab. The Docker image is used for the preprocessing steps. If you want to use GPU to train the model, you need to download the package locally!
docker pull deweywang/spatialhackathon:latestdocker run -it --rm -p 8888:8888 -v "$PWD":/workspace \
deweywang/spatialhackathon:latest \
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-rootdocker run -it --rm -p 8888:8888 -v %cd%:/workspace \
deweywang/spatialhackathon:latest \
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-rootJupyter will launch with no token and open to all local users for convenience in local setups.
Navigate to http://localhost:8888 in your browser.
- Port already in use: Change
-p 8888:8888to-p 8889:8888and open http://localhost:8889 - Volume mounting fails on Windows: Use
-v %cd%:/workspacein CMD, or$PWDin bash.
β All installations are isolated within the
.venvvirtual environment and will not affect your global Python environment or system-wide packages.
make init # Step 1. Set up virtual environment and install dependencies
source .venv/bin/activate # Step 2 (macOS/Linux)
# .venv\Scripts\activate # Step 2 (Windows)
make lab # Step 3. Launch JupyterLab- Click the kernel selector at the top-right of the notebook interface
- Choose:
Python (.venv) spatialhackathon
make clean # Remove the virtual environment and Jupyter kernel spec
make reset # Clean everything and reinitialize from scratchTested on: Apple M1 Pro, macOS Sequoia 15.3.1 (24D70)
Feel free to open issues!
This project is released for research and educational purposes only.
- The original dataset is provided by the EL Hackathon 2025 and subject to its own license terms and usage restrictions.
- The code and models in this repository are developed by Ding Yang Wang and shared under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license.
- Key architectural inspirations are derived from published academic work cited above. Please acknowledge the original authors when appropriate.
For full terms, see: https://creativecommons.org/licenses/by-nc/4.0/
If you find this project helpful in your research or work, please consider citing it:
@misc{wang2025hevisum,
author = {Ding Yang Wang},
title = {HEVisum: Spatial Gene Expression Prediction from H\&E Histology},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Dewey-Wang/HEVisum}}
}or simply use this format:
Wang, D. Y. (2025). HEVisum: Spatial Gene Expression Prediction from H&E Histology. GitHub repository: https://github.com/Dewey-Wang/HEVisum
While this project was completed entirely by myself, I welcome feedback or discussions.
If you'd like to adapt this work:
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a pull request