This branch contains the 2025 materials. Previous years are in other branches.
In this year’s practical, you will learn how to design and run a complete image segmentation pipeline using both graphical tools (ilastik) and a simple Python script. You will work with real fluorescence microscopy data from the COVID immunofluorescence assay, and the goal is to segment individual cells from 2D microscopy images.
The challenge introduces the key concepts of supervised learning in bioimage analysis, probabilistic predictions, and classical image segmentation algorithms such as the seeded watershed. The focus is on interpreting what each stage does, not on writing code from scratch.
Depending on your background and confidence with Python and image analysis, you can run the practical in two main ways:
-
Option 1: On EMBL’s virtual machines (recommended) This avoids any installation issues. Most of the required tools are pre-installed.
-
Option 2: Locally on your own computer You can install ilastik and conda locally if you prefer, though this may require some setup.
If you’re working on EMBL infrastructure, the dataset is already stored in the shared directory. To copy it to your home directory, first go to your home /home/USERNAME
and then run:
cp -r /scratch/almpanak/predoc-course ./
If you’re on your own system, you can instead download the data from the shared drive or from the Google Drive link that will be provided during the session.
- What you will build
- Dataset
- Software setup
- Step 1 — Nuclei probabilities with ilastik (Pixel Classification)
- Step 2 — Foreground & Boundary maps (ilastik Neural Network)
- Step 3 — Seeded Watershed (instance segmentation)
- (Optional) Compare to ground truth
You will create a 3-stage segmentation pipeline:
- Nuclei probabilities from DAPI using ilastik Pixel Classification.
- Cell foreground + boundary probabilities from the serum channel using ilastik Neural Network Classification with a pre-trained model (powerful-chipmunk).
- Seeded watershed segmentation combining (1) and (2) to produce final labeled cell instances.
This challenge helps you understand:
- How supervised models (like ilastik) generate probability maps
- How classical image analysis can be combined with ML outputs
- How to visualize, threshold, and evaluate segmentations
Output: a labeled instance image + quick color visualizations.
Each HDF5 sample contains:
/raw
— shape(3, 1024, 1024)
with channels: 0: serum, 1: infection (ignore), 2: nuclei (DAPI)/cells
— instance ground truth(1024, 1024)
/infected
— per-nucleus infection labels(1024, 1024)
(not used here)
We also provide separated channel HDF5 files in hdf5/nuclei/
and hdf5/serum/
with /raw
(1, 1024, 1024)
for convenience.
Download the latest version of ilastik (GUI app): https://www.ilastik.org/download.html
If you are working on the EMBL cluster, you don’t need to install conda or anything else. The shared environment is already prepared.
Run the following commands once to create a local copy of the course environment in your home folder:
mkdir -p ~/envs/predoc-challenge
tar -xzf /scratch/almpanak/envs/predoc-challenge-conda-pack.tar.gz -C ~/envs/predoc-challenge
~/envs/predoc-challenge/bin/conda-unpack
To check that it works, run:
~/envs/predoc-challenge/bin/python -c "import numpy, h5py, skimage, matplotlib, scipy; print('env ok')"
If you see env ok
, the environment is correctly installed.
From now on, you can use this Python environment to run the seeded watershed script without activating anything:
~/envs/predoc-challenge/bin/python ~/predoc-course/seeded_watershed.py \
--nuc /path/to/WellXX_..._nucProb.h5 \
--nn /path/to/WellXX_..._nnseg.h5 \
--out ./
If you’re not on the cluster, you can create your own conda environment from the provided environment.yml
or install the listed dependencies manually (numpy
, h5py
, scikit-image
, matplotlib
, scipy
).
- Open Pixel Classification in ilastik.
- Load nuclei images (DAPI channel). Make sure to set the correct axis order (CYX) under Raw Data -> Edit Properties.
- Create 2 labels: Nucleus and Background.
- Enable Live Update and refine annotations until the prediction looks clean.
- Export the result as HDF5, with dataset name
/exported_data
and two channels[Nucleus, Background]
.
We will later use channel 0 (Nucleus) as our seed probability map.
-
Open Neural Network Classification (Local) in ilastik (already installed on JupyterHub).
-
Load the serum channel as input.
-
In NN Prediction, load the pre-trained model powerful-chipmunk from bioimage.io.
-
Click Live Predict to run the model.
-
Export the result to HDF5 under
/exported_data
with 2 channels:- channel 0 → Foreground (cell interior)
- channel 1 → Boundary (cell borders)
This file will be used together with the nuclei probabilities in the final step.
Given your two HDF5 exports (nuclei + neural network), the script seeded_watershed.py
will generate a labeled instance segmentation.
Assumptions:
- Nuclei H5
/exported_data
has 2 channels[Nucleus, Background]
→ use channel 0. - NN H5
/exported_data
has 2 channels[Foreground, Boundary]
→ use channels 0 and 1. - Default thresholds:
NUC_THR=0.60
,FG_THR=0.50
.
Run:
~/envs/predoc-challenge/bin/python ~/predoc-course/seeded_watershed.py \
--nuc /path/to/WellXX_..._nucProb.h5 \
--nn /path/to/WellXX_..._nnseg.h5 \
--out ./
Outputs:
*_instances.tif
— labeled instance segmentation (uint16
)*_instances_color.png
— random color per cell*_instances_overlay.png
— color overlay on the foreground channel
You’ll also see some useful printouts for debugging: image shapes, seed counts, mask coverage, and final instance count.
For quantitative evaluation, compare your segmentation (*_instances.tif
) to the ground truth (/cells
dataset) using metrics such as:
However, the goal of this session is mainly conceptual — focus on understanding the steps and visually assessing your results.
End of challenge. You have now combined a learned probability map with classical image processing to achieve biologically meaningful cell segmentation!