Advanced 3D instance segmentation specifically designed for architectural scene understanding
๐ Baselines โข ๐ Summary โข ๐จ Visualization โข ๐ Report
Architect3D adapts the state-of-the-art Mask3D model to work with the ScanNet++ dataset, enabling fine-grained 3D instance segmentation for architectural scenes with 2,753 classes - a 10x increase from standard datasets.
- ๐๏ธ Architectural Focus: Specialized for building and indoor architectural scenes
- ๐ Massive Scale: Handles 2,753 fine-grained architectural classes
- ๐ High Resolution: Optimized for 0.02m voxel precision
- ๐ OpenMask3D Ready: Prepared for open-vocabulary integration
- ๐ Comprehensive Evaluation: Detailed architectural scene analysis
Note: Due to computational constraints (GPU limitations, 200GB storage limit), full evaluation is pending. The model has been successfully adapted and the framework is complete.
| Model | Dataset | Classes | AP | AP50 | AP25 | Status |
|---|---|---|---|---|---|---|
| Mask3D | ScanNet200 | 200 | 26.9 | 36.2 | 41.4 | โ Baseline |
| OpenMask3D | ScanNet200 | 200 | 15.4 | 19.9 | 23.1 | โ Baseline |
| Architect3D | ScanNet++ | 2,753 | Pending | Pending | Pending | ๐ Ready |
See baseline.md for detailed comparisons
# System requirements
CUDA >= 11.3
Python >= 3.8
GPU Memory >= 8GB# Clone repository
git clone [your-repo-url]
cd Architect3D
# Install dependencies
pip install -r requirements.txt
# For detailed MinkowskiEngine setup, see below โฌ๏ธ# 1. Preprocess ScanNet++ data
cd Architect3D/Mask3D/
sbatch preprocessing.sh
# 2. Run evaluation
sbatch scannetpp_eval.sh
# 3. Generate visualizations
python vis.py๐ Click to expand detailed structure
Architect3D/
โโโ ๐ README.md # This file
โโโ ๐ PROJECT_SUMMARY.md # Executive summary
โโโ ๐ baseline.md # Performance baselines
โโโ ๐จ vis.py # t-SNE visualization generator
โโโ ๐ interactive_tsne_visualization.html # Interactive class embeddings
โโโ ๐ Architect3D.pdf # Comprehensive project report
โ
โโโ ๐๏ธ Architect3D/ # Core implementation
โ โโโ Mask3D/ # Adapted Mask3D for ScanNet++
โ โ โโโ ๐ main_instance_segmentation.py # Main training/evaluation script
โ โ โโโ โ๏ธ conf/ # Hydra configurations
โ โ โโโ ๐ benchmark/ # Evaluation framework
โ โ โโโ ๐๏ธ datasets/ # Data loaders & preprocessing
โ โ โโโ ๐ง models/ # Neural network architectures
โ โ โโโ ๐ฏ trainer/ # Training pipeline
โ โ โโโ ๐พ saved/final/ # Model checkpoints
โ โ โโโ ๐ jobs/ # Training logs
โ โโโ ๐ requirements.txt
โ
โโโ ๐ openmask3d/ # OpenMask3D integration
โ โโโ openmask3d/ # Core modules
โ โโโ ๐ญ class_agnostic_mask_computation/
โ โโโ ๐ฎ mask_features_computation/
โ โโโ ๐ evaluation/
โ โโโ ๐๏ธ visualization/
โ
โโโ ๐ scannetpp/ # ScanNet++ dataset
โ โโโ metadata/ # Class definitions
โ โโโ scannetpp_ply/ # 3D scenes
โ โโโ splits/ # Train/val/test splits
โ
โโโ ๐ eval_results_architectural_classes/ # Evaluation results
graph TB
A[ScanNet++ Dataset<br/>2,753 classes] --> B[Sparse 3D CNN<br/>MinkowskiEngine]
B --> C[Multi-scale Features]
C --> D[Transformer Decoder]
D --> E[Enhanced Head<br/>2,753 outputs]
E --> F[Instance Masks]
G[RGB Images] --> H[CLIP Features]
H --> I[Multi-view Fusion]
I --> J[OpenMask3D Pipeline]
F --> K[Architectural<br/>Predictions]
J --> K
style A fill:#e1f5fe
style E fill:#f3e5f5
style K fill:#e8f5e8
| Component | Original | Architect3D | Improvement |
|---|---|---|---|
| Classes | 200 | 2,753 | ๐ฅ 13.8x scaling |
| Voxel Size | 0.05m | 0.02m | ๐ฏ 2.5x precision |
| Domain | General | Architectural | ๐๏ธ Specialized |
| Head Architecture | Standard | Scaled | โก Optimized |
๐ง Complete MinkowskiEngine Setup (ETH Cluster)
# STEP 1: Load modules
module load gcc/8.2.0 python_gpu/3.8.5 cuda/11.3.1 cudnn/8.2.1.32
# STEP 2: Create environment
python -m venv architect3d_env
source architect3d_env/bin/activate
# STEP 3: Install PyTorch
pip install torch==1.12.1 torchvision==0.13.1 -f https://download.pytorch.org/whl/cu113/torch_stable.html
# STEP 4: Install dependencies
pip install ninja pytorch-lightning==1.7.2 hydra-core==1.0.5
# STEP 5: Setup MinkowskiEngine
git clone https://github.com/NVIDIA/MinkowskiEngine.git
cd MinkowskiEngine
# Edit setup.py (uncomment CUDA_HOME configuration)
python setup.py install
# STEP 6: Install additional packages
pip install -r requirements.txt
# STEP 7: Install CLIP & SAM
pip install git+https://github.com/openai/CLIP.git --no-deps
pip install git+https://github.com/facebookresearch/segment-anything.git --no-deps๐จ Interactive t-SNE: Explore 2,753 architectural class embeddings
- Open
interactive_tsne_visualization.htmlin browser - Visualize class relationships and clusters
- Understand architectural taxonomy
๐ Comprehensive Evaluation:
- AP Metrics: Standard instance segmentation evaluation
- Class Analysis: Head/Common/Tail performance breakdown
- Architectural Focus: Building-specific evaluation protocols
- Mask3D: Foundation model
- ScanNet++: Enhanced dataset
- OpenMask3D: Open-vocabulary framework
- CLIP: Vision-language features
This project was developed for the 3D Vision course at ETH Zurich. Special thanks to supervisors for guidance and the unofficial OpenMask3D codebase.
| Resource | Description | Link |
|---|---|---|
| ๐ Full Report | Comprehensive documentation | |
| ๐ Baselines | Performance comparisons | Markdown |
| ๐ Summary | Executive overview | Summary |
| ๐จ Visualization | Interactive t-SNE | HTML |
| โ๏ธ Configs | Hydra configuration | Directory |