This is the repository for the study published in this paper. This paper presents a self-supervised model for classifying white blood cells in peripheral blood smears, achieving high accuracy (F1: 96.2%) while generalizing to diverse label sets. The lightweight EfficientNetV2-B0-based approach enhances label efficiency with active learning and is available as the HemoSight web app to streamline clinical workflows.
General flow of data pipeline:
curate.py: curate raw data folders to create data reference csv.loader.py: Load data reference csv and perform train, validation, and test split.train.py,generator.py: Train model; data augmentation generated from generator.val.py,predictor.py: Validate trained model; predictor performs classification on embeddings.
Folders:
srcstores python source code.frontendstores frontend source code.datais used to store raw data.derivedis used to store results.mongodbis used by the MongoDB database. Three folders (data,derived,src) are needed for file I/O, and should be mounted to the container.
Configuration files:
- Path configuration file (
/src/core/settings.json):
This file is loaded byutil.GlobalSettings.settings.jsonis used by default but can be overwritten bysettings_{systemname}.jsonwhere{systemname}is the computer name such asThinkPad. This enables the same repository code base to be cloned and run on multiple environments. - Job configuration file:
Some examples are located at
/src/config_*.json. See below for their usage.
- Create environment
conda create -n HemoSight python=3.10and initialize the environment. - Install packages listed in the dockerfile.
- You may also need to install
ipykernelfor editing ipynb files.
Dockerfile.gpu is the GPU version. Dockerfile.cpu is the CPU version.
- Build
docker build -t hematology:v1 -f Dockerfile.cpu . - Run the container with above three folder mounted
docker run -it --gpus all --rm -v "$(pwd)/src:/src" -v "$(pwd)/derived:/derived" -v "D:/Drive/Data/Hematology:/data" hematology:v1 - After the container is running, execute the following in the container.
- Training
python -m model.train --cfg config.json - Validation
python -m model.val --run 20231208192208
- Training
You may need to install node.js 20.11, MongoDB 7.0.
- Build
docker-compose -f docker-compose.dev.yaml up --build - Access
http://localhost:4002for index page.
If you find this repository helpful in your research or work, please consider citing our paper:
@inproceedings{liu2024adaptive,
title={Adaptive self-supervised learning of morphological landscape for leukocytes classification in peripheral blood smears},
author={Liu, Z., Castillo, S. P., Han, X., Sun, X., Hu, Z., and Yuan, Y.},
booktitle={Proceedings of the IEEE-EMBS International Conference on Biomedical and Health Informatics},
year={2024},
url={https://openreview.net/forum?id=xkgKn92AGp}
}
Feel free to contact us for further information or questions related to the paper and this repository.
Yuan Lab @ MD Anderson Cancer Center