Create an interactive VR scene from a single text prompt.
Text2VR orchestrates multiple AI components with Docker Compose so you can reproduce builds and keep dependencies isolated and stable.
- Stage 1 β Panorama generation (DreamScene360): text β 360Β° equirect panorama
- Stage 2 β Asset segmentation (GroundingDINO + SAM + optional GPT-4o): masks for interactive objects
- Stage 3 β Background inpainting (SDXL inpaint, wrap-aware): remove masked assets cleanly
- Stage 4 β 3D training (DreamScene360): train a Gaussian scene using the inpainted panorama
- (Roadmap) Asset mesh extraction & alignment β Unity integration for VR HMD
Services run in separate containers to avoid dependency conflicts and to allow independent upgrades.
- NVIDIA GPU + recent driver
- Docker and NVIDIA Container Toolkit
Text2VR/
βββ ASSET_SEG/ # Asset segmentation service (GroundingDINO + SAM + GPT)
β βββ Dockerfile
β βββ requirements.txt
β βββ segment_panorama.py
β βββ README.md
βββ BG_INPAINT/ # Background inpainting service (SDXL inpaint)
β βββ Dockerfile
β βββ requirements.txt
β βββ inpaint_panorama.py
β βββ README.md
βββ DREAMSCENE360/ # Panorama & training (legacy-stable env)
β βββ ...
β βββ Dockerfile
β βββ requirements.txt
β βββ pano_generator.py
β βββ train.py
β βββ README.md
βββ docker-compose.yml
βββ run_pipeline.sh # One-click pipeline
βββ output/
βββ pre_checkpoints/ # Shared pretrained weights (created by you)# Create .env (Compose auto-loads it)
cat > .env << 'EOF'
OPENAI_API_KEY=...
# Optional: set HF cache inside the repo for persistence (matched to docker-compose.yml)
HF_HOME=/workspace/cache/hf
EOF# From repo root
mkdir -p output pre_checkpoints
# Run these commands from the Text2VR/pre_checkpoints/ directory
# SAM Checkpoint (for ASSET_SEG)
wget [https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth](https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth)
# DreamScene360 DPT-Depth Checkpoint (for DREAMSCENE360)
wget "[https://www.dropbox.com/scl/fi/y11c69dd9fjf05s640qj9/omnidata_dpt_depth_v2.ckpt?rlkey=vj7a8n1s2q4q5q5j3q2q2q2q2&dl=1](https://www.dropbox.com/scl/fi/y11c69dd9fjf05s640qj9/omnidata_dpt_depth_v2.ckpt?rlkey=vj7a8n1s2q4q5q5j3q2q2q2q2&dl=1)" -O omnidata_dpt_depth_v2.ckpt
# Put these files into pre_checkpoints/ !!DreamScene360 DPT-Depth Checkpoint,
- download the omnidata_dpt_depth_v2.ckpt file from the official from this Dropbox folder and place it in
Text2VR/pre_checkpoints/directory.
Text2VR/
βββ ASSEGT_GEN/
β βββ ...
βββ BG_INPAINT/
β βββ ...
βββ DREAMSCENE360/ # Panorama & training (legacy-stable env)
β βββ ...
βββ docker-compose.yml # The orchestrator for all services
βββ run_pipeline.sh # The one-click script to run the full pipeline
βββ output/
βββ pre_checkpoints/ # Shared directory for pretrained models (created by you)
βββ big-lama.ckpt # <-- Pretrained models will be placed here
βββ omnidata_dpt_depth_v2.ckpt # <-- Pretrained models will be placed here
βββ monidata_dpt_normal_v2.ckpt # <-- Pretrained models will be placed here
βββ monidata_dpt_normal_v2.ckpt # <-- Pretrained models will be placed here
βββ sam_vit_h_4b8939.pth # <-- Pretrained models will be placed here
βββ ...docker-compose buildThis is the simplest way to run the entire pipeline, from a text prompt to a fully segmented panorama, with a single command.
Before running, you must configure the main pipeline script.
- Open run_pipeline.sh in your editor.
- Set your OpenAI API Key. This is required for both
self-refinementin the panorama generation stage and for asset identification in the segmentation stage. - Customize the scene and prompt by changing the
SCENE_NAMEandPANO_PROMPTvariables. This will determine the output folder name and the content of the generated scene.
# In run_pipeline.sh
export OPENAI_API_KEY="your openai api key" # IMPORTANT: Set your key here, or use .env !!
SCENE_NAME="simple_indoor"
PANO_PROMPT="A 360 equirectangular photo of a minimalist and spacious living room. In the center, there is a single modern leather sofa. The room has plain white walls, a smooth light gray concrete floor, and no other furniture or decorations. The scene is brightly lit by soft, natural light from a large window, with no harsh shadows. photorealistic, 8k, sharp focus."From the root of the Text2VR repository, execute the following commands:
# 1. Make the script executable (only needed once)
chmod +x run_pipeline.sh
# 2. Run the entire pipeline
bash ./run_pipeline.shWhat it does:
- Stage 1 β Panorama (DreamScene360), optional GPT self-refinement
- Stage 2 β Segmentation (GroundingDINO+SAM+GPT) β /output/_masks/
- Stage 3 β Inpainting (SDXL) β /DREAMSCENE360/data//inpainted_panorama.png
- Stage 4 β Training (DreamScene360)
run_pipeline.shreadsOPENAI_API_KEYfrom.env.Edit the script to set yourSCENE_NAMEandprompt.
If you need to work on a single service without running the entire pipeline, you can use docker-compose to enter its specific container.
This is useful for debugging the original train.py or other core functionalities.
# Build and run the container, then drop into a bash shell
docker-compose run --rm dreamscene360 /bin/bashYou will now be inside the container at /workspace/dreamscene360_code. See the dreamscene360_service/README.md for detailed instructions on manual execution.
This is useful for testing or modifying the panorama segmentation logic.
# Build and run the container, then drop into a bash shell
docker-compose run --rm segmentation /bin/bashYou will now be inside the container at /app. See the ASSET_SEG/README.md for detailed instructions on manual execution.