EIPP Theory@EMBL 2025 — Cell Segmentation Challenge

This branch contains the 2025 materials. Previous years are in other branches.

In this year’s practical, you will learn how to design and run a complete image segmentation pipeline using both graphical tools (ilastik) and a simple Python script. You will work with real fluorescence microscopy data from the COVID immunofluorescence assay, and the goal is to segment individual cells from 2D microscopy images.

The challenge introduces the key concepts of supervised learning in bioimage analysis, probabilistic predictions, and classical image segmentation algorithms such as the seeded watershed. The focus is on interpreting what each stage does, not on writing code from scratch.

How is this going to work?

Depending on your background and confidence with Python and image analysis, you can run the practical in two main ways:

Option 1: On EMBL’s virtual machines (recommended) This avoids any installation issues. Most of the required tools are pre-installed.
- JupyterHub
- BARD
Option 2: Locally on your own computer You can install ilastik and conda locally if you prefer, though this may require some setup.

Accessing the data on the cluster

If you’re working on EMBL infrastructure, the dataset is already stored in the shared directory. To copy it to your home directory, first go to your home /home/USERNAME and then run:

cp -r /scratch/almpanak/predoc-course ./

If you’re on your own system, you can instead download the data from the shared drive or from the Google Drive link that will be provided during the session.

What you will build

You will create a 3-stage segmentation pipeline:

Nuclei probabilities from DAPI using ilastik Pixel Classification.
Cell foreground + boundary probabilities from the serum channel using ilastik Neural Network Classification with a pre-trained model (powerful-chipmunk).
Seeded watershed segmentation combining (1) and (2) to produce final labeled cell instances.

This challenge helps you understand:

How supervised models (like ilastik) generate probability maps
How classical image analysis can be combined with ML outputs
How to visualize, threshold, and evaluate segmentations

Output: a labeled instance image + quick color visualizations.

Dataset

Each HDF5 sample contains:

/raw — shape (3, 1024, 1024) with channels: 0: serum, 1: infection (ignore), 2: nuclei (DAPI)
/cells — instance ground truth (1024, 1024)
/infected — per-nucleus infection labels (1024, 1024) (not used here)

We also provide separated channel HDF5 files in hdf5/nuclei/ and hdf5/serum/ with /raw (1, 1024, 1024) for convenience.

Software setup

ilastik

Download the latest version of ilastik (GUI app): https://www.ilastik.org/download.html

Environment setup (for running the Python script)

If you are working on the EMBL cluster, you don’t need to install conda or anything else. The shared environment is already prepared.

Run the following commands once to create a local copy of the course environment in your home folder:

mkdir -p ~/envs/predoc-challenge

tar -xzf /scratch/almpanak/envs/predoc-challenge-conda-pack.tar.gz -C ~/envs/predoc-challenge

~/envs/predoc-challenge/bin/conda-unpack

To check that it works, run:

~/envs/predoc-challenge/bin/python -c "import numpy, h5py, skimage, matplotlib, scipy; print('env ok')"

If you see env ok, the environment is correctly installed.

From now on, you can use this Python environment to run the seeded watershed script without activating anything:

~/envs/predoc-challenge/bin/python ~/predoc-course/seeded_watershed.py \
  --nuc /path/to/WellXX_..._nucProb.h5 \
  --nn  /path/to/WellXX_..._nnseg.h5 \
  --out ./

If you’re not on the cluster, you can create your own conda environment from the provided environment.yml or install the listed dependencies manually (numpy, h5py, scikit-image, matplotlib, scipy).

Step 1 — Nuclei probabilities with ilastik (Pixel Classification)

Open Pixel Classification in ilastik.
Load nuclei images (DAPI channel). Make sure to set the correct axis order (CYX) under Raw Data -> Edit Properties.
Create 2 labels: Nucleus and Background.
Enable Live Update and refine annotations until the prediction looks clean.
Export the result as HDF5, with dataset name /exported_data and two channels [Nucleus, Background].

We will later use channel 0 (Nucleus) as our seed probability map.

Step 2 — Foreground & Boundary maps (ilastik Neural Network)

Open Neural Network Classification (Local) in ilastik (already installed on JupyterHub).
Load the serum channel as input.
In NN Prediction, load the pre-trained model powerful-chipmunk from bioimage.io.
Click Live Predict to run the model.
Export the result to HDF5 under /exported_data with 2 channels:
- channel 0 → Foreground (cell interior)
- channel 1 → Boundary (cell borders)

This file will be used together with the nuclei probabilities in the final step.

Step 3 — Seeded Watershed (instance segmentation)

Given your two HDF5 exports (nuclei + neural network), the script seeded_watershed.py will generate a labeled instance segmentation.

Assumptions:

Nuclei H5 /exported_data has 2 channels [Nucleus, Background] → use channel 0.
NN H5 /exported_data has 2 channels [Foreground, Boundary] → use channels 0 and 1.
Default thresholds: NUC_THR=0.60, FG_THR=0.50.

Run:

~/envs/predoc-challenge/bin/python ~/predoc-course/seeded_watershed.py \
  --nuc /path/to/WellXX_..._nucProb.h5 \
  --nn  /path/to/WellXX_..._nnseg.h5 \
  --out ./

Outputs:

*_instances.tif — labeled instance segmentation (uint16)
*_instances_color.png — random color per cell
*_instances_overlay.png — color overlay on the foreground channel

You’ll also see some useful printouts for debugging: image shapes, seed counts, mask coverage, and final instance count.

(Optional) Compare to ground truth

For quantitative evaluation, compare your segmentation (*_instances.tif) to the ground truth (/cells dataset) using metrics such as:

However, the goal of this session is mainly conceptual — focus on understanding the steps and visually assessing your results.

End of challenge. You have now combined a learned probability map with classical image processing to achieve biologically meaningful cell segmentation!

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
img		img
.gitignore		.gitignore
README.md		README.md
environment_predoc-challenge.yml		environment_predoc-challenge.yml
seeded_watershed.py		seeded_watershed.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EIPP Theory@EMBL 2025 — Cell Segmentation Challenge

How is this going to work?

Accessing the data on the cluster

Table of Contents

What you will build

Dataset

Software setup

ilastik

Environment setup (for running the Python script)

Step 1 — Nuclei probabilities with ilastik (Pixel Classification)

Step 2 — Foreground & Boundary maps (ilastik Neural Network)

Step 3 — Seeded Watershed (instance segmentation)

(Optional) Compare to ground truth

About

Uh oh!

Releases

Packages

Contributors 6

Uh oh!

Languages

kreshuklab/predoc-course

Folders and files

Latest commit

History

Repository files navigation

EIPP Theory@EMBL 2025 — Cell Segmentation Challenge

How is this going to work?

Accessing the data on the cluster

Table of Contents

What you will build

Dataset

Software setup

ilastik

Environment setup (for running the Python script)

Step 1 — Nuclei probabilities with ilastik (Pixel Classification)

Step 2 — Foreground & Boundary maps (ilastik Neural Network)

Step 3 — Seeded Watershed (instance segmentation)

(Optional) Compare to ground truth

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Uh oh!

Languages

Packages