Single-cell Hi-C (scHi-C) technologies have significantly advanced our understanding of the three-dimensional genome organization. However, scHi-C data are often sparse and noisy, leading to substantial computational challenges in downstream analyses.
In this study, we introduce SHICEDO, a novel deep-learning model specifically designed to enhance scHi-C contact matrices by imputing missing or sparsely captured chromatin contacts through a generative adversarial framework. SHICEDO leverages the unique structural characteristics of scHi-C matrices to derive customized features that enable effective data enhancement}. Additionally, the model incorporates a channel-wise attention mechanism to mitigate the over-smoothing issue commonly associated with scHi-C enhancement methods.
Through simulations and real-data applications, we demonstrate that SHICEDO outperforms the state-of-the-art methods, achieving superior quantitative and qualitative results.
Moreover, SHICEDO enhances key structural features in scHi-C data, thus enabling more precise delineation of chromatin structures such as A/B compartments, TAD-like domains, and chromatin loops.
For the environment: Install PyTorch based on the CUDA version of your machine.
Please check PyTorch for details
In this demo, the machine has CUDA Version: 11.6
To create SCHICEDO environment, use:
conda env create -f SCHICEDO_environment.yml
To activate this environment, use
conda activate SCHICEDO
The processed data is available at the following link:
Download processed data.
There are two processed data available, in the following example, we will demo with processed Lee et al. dataset in folder Lee
The downloaded data may be compressed in different files, please move the files into one folder after Extract
mkdir data- Please download the processed data to the data folder and use the correct path in the script for data loading.
If you wish to preprocess other datasets. Please check the data preprocessing section
If you wish to process raw data, please run the following command:
In this example, we show how to process the Nagano et al raw data is available at Download raw data.
cd data_preprocessing
mkdir process_data/Nagano
./data_preprocessing.sh
data_preprocessing.sh will run 6 scripts to save processed data:
- Filter the cells based on contact number
python data_filter.py - Filter out the inter-chromosomal interactions
python filter_true_data.py - Downsampling the matrix to generate low-resolution input
python down_sampling_sciHiC.py - Run Rscrip to do Bandnorm
Rscript bandnorm.R
(Please follow the instruction to install Bandnorm) - Organize normalized results
python run_bandnorm.py - Divide large matrixes into submatrices and save as torch tensor
python generate_input.py
For optimal performance when training on new data, parameter fine-tuning is essential.
The model and date setting were the same as described in the paper.
After choosing suitable hyper-parameters, the model can be trained with the following command:
python test_train.py
After training, Enhanced scHi-C can predict with the following command:
python test_prediction.py
Users can also use the provided pre-trained model to make predictions.
Please change the corresponding model loading path in the test_prediction.py file.
Users can use the provided pre-trained model to make the prediction:
mkdir pretrained_model- Please download the pretrained model to the pretrained_model folder and use the correct path in the script Download pre-trained model.
python test_pretrained_prediction.py
After prediction, users can generate the MSE and macro F1 of low resolution and prediction by running the following command:
python test_evaluation.py
If you wish to check the heatmap of low resolution, prediction, and true scHi-C, please run the following command:
tensorboard --logdir=runs/heatmap
Here we used processed Lee et al. (download from Download processed data) to demo the training, prediction and evaluation process:
>> conda activate SHICEDO
> python test_train.py
> python test_prediction.py
> python test_evaluation.py
For heatmap and loss visitation:
tensorboard --logdir=runs/heatmap
Paper:
Huang, J., Ma, R., Strobel, M., Hu, Y., Ye, T., Jiang, T., & Ma, W. (2025). SHICEDO: Single-cell Hi-C Data Enhancement with Reduced Over-Smoothing. Bioinformatics, btaf575. DOI: https://doi.org/10.1093/bioinformatics/btaf575 Code: 10.5281/zenodo.17069264
This project is licensed under the terms of the MIT License.