We introduce the Positional Encoding Field (PE-Field), which extends positional encodings from the 2D plane to a structured 3D field. PE-Field incorporates depth-aware encodings for volumetric reasoning and hierarchical encodings for fine-grained sub-patch control, enabling DiTs to model geometry directly in 3D space. Our PE-Field–augmented DiT achieves state-of-the-art performance on single-image novel view synthesis and generalizes to controllable spatial image editing.
To create a Python environment named pe_field under the ./envs directory and install all dependencies from requirements.txt, run:
# Create virtual environment
python3 -m venv ./envs/pe_field
# Activate environment
source ./envs/pe_field/bin/activate
# Install dependencies
pip install -r requirements.txt-
Download FLUX.1-Kontext (except transformer) from black-forest-labs/FLUX.1-Kontext-dev and place it under:
./FLUX.1-Kontext-dev -
Download MoGe weights from Ruicheng/moge-2-vitl-normal and place it under:
./moge-2-vitl-normal/model.pt -
Download our Transformer weights from PE-Field/FLUX.1-Kontext-dev and place them under:
./checkpoints/transformer
You can run inference with either a single image path or a directory containing multiple images.
--phi and --theta are two parameters used to adjust the azimuth angle and the elevation angle, respectively.
python ./infer_viewchanger_single_v2.py \
--moge_checkpoint_path "./moge-2-vitl-normal/model.pt" \
--transformer_checkpoint_path "./checkpoints" \
--flux_kontext_path "./FLUX.1-Kontext-dev" \
--input_image "image_path_or_dir" \
--output_dir "outputs" \
--phi -5 --theta 5