Thanks to visit codestin.com
Credit goes to github.com

Skip to content

lvsn/ZeroComp

Repository files navigation

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion

Zitian Zhang, Frédéric Fortier-Chouinard, Mathieu Garon, Anand Bhattad, Jean-François Lalonde

WACV 2025 (Oral)

[Website] [Paper] [Supplementary]

Environments

  1. Clone the repo and submodules
git clone --recursive https://github.com/zzt76/zerocomp.git

Make sure you also clone the submodule predictors.

  1. Install required wheels, note that different intrinsic predictors may require different libraries.
pip install -r requirements.txt
  1. Download Stable Diffussion 2.1 from huggingface, and modify the pretrained_model_name_or_path variable in configs/sg_labo.yaml.

Pretrained weights

Pretrained weights are only available for non-commercial use under CC BY-NC-SA 4.0 license.

Depth, normals, albedo Openrooms 7days: link

Normals, albedo Openrooms 2days: link

Depth, normals, albedo, roughness, metallic Interior Verse 2days: link Interior Verse 7days: link

Tips for the user cases when the footprint depth of the object is not available

In our paper, the footprint depth of the object is needed to align the object depth with the background depth (or the other way around if the depth is relative disparity). However, we notice that the footprint depth is not always available. So here we provide two different solution:

  1. In the newest version, when the footprint depth is not available, we use the smallest bg depth value inside the object mask as the minimum object depth. Please refer to Line 274-292.
  2. Another solution is you can use the pretrained model without the depth channel. You can download this model with the link provided above, and change the following arguments in the config file (as in configs/sg_labo_wo_depth.yaml):
conditioning_maps: [normal, diffuse, shading, mask]

eval:
    controlnet_model_name_or_path: checkpoints/openrooms_2days_wo_depth
    shading_maskout_mode: BBox
    shading_maskout_bbox_dilation: 50 # This is a hyperparameter deciding how large we should mask around the object

Predictors

You can get the intrinsic predictor weights from the original repos, the links are provided in the following. After downloading, move them to .cache/checkpoints folder.

Depth

ZoeDepth: ZoeD_M12_NK.pt DepthAnything: depth_anything_metric_depth_indoor.pt DepthAnythingV2: depth_anything_v2_vitl.pth

Normals

OmniDataV2: omnidata_dpt_normal_v2.ckpt StableNormal: stable-normal-v0-1

Materials

For diffuse only, you can use our own not that good model dfnet. For diffuse, roughness and metallic, you can precompute these maps by IntrinsicImageDiffusion, RGB<->X or other predictors. Name them as in the provided test dataset and load them by changing predictor_names to precompute.

Custom predictors

You can use other predictors you prefer by modifying controlnet_input_handle.py. Implement handle_*** functions and modify ToPredictors class.

All predictors are subject to their own licenses. Please check the relating conditions carefully.

ZeroComp Test dataset

You can download the ZeroComp test dataset here.

Inference

To run the evaluations as in the paper:

python eval_controlnet_composite.py --config-name sg_labo

Live demo

ZeroComp trained on Openrooms:

python gradio_composite.py

ZeroComp trained on InteriorVerse, with roughness and metallic:

python gradio_composite_w_rm.py

Training

Openrooms dataset

The Openrooms dataset should be structured as follows:

openrooms_mainxml1/
├── Geometry/
│   └── main_xml1/
│       └── scene0001_00/
│           ├── imdepth_1.dat
│           ├── imnormal_1.png
├── Image/
│   └── main_xml1/
│       └── scene0001_00/
│           ├── im_1.hdr
│           ├── im_1.png
├── Mask/
│   └── main_xml1/
│       └── scene0001_00/
│           ├── immask_1.png
├── Material/
│   └── main_xml1/
│       └── scene0001_00/
│           ├── imbaseColor_1.png
│           ├── imroughness_1.png
│           ├── immetallic_1.png
└── Shading/
    └── main_xml1/
        └── scene0001_00/
            ├── imshading_1.png
            ├── imshadow_1.png

Training command

On a single GPU, you can run:

python train_controlnet.py --config-name train_openrooms

For multi-GPU training, you can use:

accelerate launch train_controlnet.py --config-name train_openrooms

Acknowledgements

This research was supported by NSERC grants RGPIN 2020-04799 and ALLRP 586543-23, Mitacs and Depix. Computing resources were provided by the Digital Research Alliance of Canada. We also thank Louis-Étienne Messier and Justine Giroux for their help as well as all members of the lab for discussions and proofreading help.

This implementation builds upon Hugging Face’s Diffusers library. We also acknowledge Gradio for providing a developer-friendly tool to create the interative demos for our models.

BibTex

If you find it useful, please consider citing ZeroComp:

@InProceedings{zhang2025zerocomp,
    author    = {Zhang, Zitian and Fortier-Chouinard, Fr\'ed\'eric and Garon, Mathieu and Bhattad, Anand and Lalonde, Jean-Fran\c{c}ois},
    title     = {ZeroComp: Zero-Shot Object Compositing from Image Intrinsics via Diffusion},
    booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
    month     = {February},
    year      = {2025},
    pages     = {483-494}
}

Our follow-up work SpotLight focuses on training-free local relighting, please feel free to check it out if you're interested :)

@misc{fortierchouinard2025spotlightshadowguidedobjectrelighting,
      title={SpotLight: Shadow-Guided Object Relighting via Diffusion}, 
      author={Frédéric Fortier-Chouinard and Zitian Zhang and Louis-Etienne Messier and Mathieu Garon and Anand Bhattad and Jean-François Lalonde},
      year={2025},
      eprint={2411.18665},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18665}, 
}

License

The codes, pretrained weights and test dataset are all for non-commercial use only.

ZeroComp: Zero-shot Object Compositing from Image Intrinsics via Diffusion by Zitian Zhang, Frédéric Fortier-Chouinard, Mathieu Garon, Anand Bhattad, Jean-François Lalonde is licensed under CC BY-NC-SA 4.0

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages