This is a PyTorch implementation of [CVPR 2024] paper: Zero-Shot Structure-Preserving Diffusion Model for High Dynamic Range Tone Mapping.
A journal version of this work, entitled A Flexible Zero-Shot Approach to Tone Mapping via Structure-Preserving Diffusion Models, is accepted by IEEE TCSVT.
Please follow the instructions of ControlNet to set up the environment.
For training:
In our paper, we use the high-resolution part of Flickr2K dataset as the training set. To prepare the training data, you should first generate the MSCN maps and blurred luma maps following the instructions in our paper, and put them in training/Flickr2K/source/ and training/Flickr2K/source1/ respectively. The original images should be put in training/Flickr2K/target/.
Then, download a pre-trained model (StableDiffusion or ControlNet), and put it in my_model_weights/, then change resume_path in train.py.
For simple testing:
You can also download our pretrained model here.
python train.py
To train the basic model, use models/cldm_v15.yaml as the config file.
To fine tune the model with decoding-encoding process, use models/cldm_v15_ftenc.yaml as the config file.
Put the MSCN maps, the blured luma maps, and the original images in test_imgs/ as the example test image does. Use models/cldm_v15_inference.yaml as the config file.
python test.py
If necessary, the parameters and arguments may be changed (directories, pre-processing and post-processing methods, etc.). Please follow the instructions in the code.
To test the FTA strategy we introduced in our journal version paper, you can run the script test_fta.py:
python test_fta.py
If necessary, the parameters and arguments may be changed. Please follow the instructions in the code. If using the style matching loss, you need to download the pre-trained DA-CLIP-like encoder here.
Our implementation is based on this repository: ControlNet.