A state-of-the-art AI model that converts natural language descriptions into parametric CAD files using a multi-modal transformer architecture with geometric validation and visual feedback.
This project implements a multi-modal transformer architecture that:
- Processes text descriptions using BERT-based encoders
- Generates parametric CAD sequences with geometric validation
- Incorporates visual feedback through CLIP-based scoring
- Validates geometric constraints and manufacturing requirements
- DeepCAD dataset processing and cleaning
- Multi-modal annotation generation
- Geometric hash-based deduplication
- 8-bit quantized parameter representation
- Adaptive text encoder with BERT backbone
- 24-layer transformer decoder for CAD sequences
- Visual feedback module with CLIP integration
- Geometric validation system
- Sequential pre-training on CAD sequences
- Visual feedback fine-tuning with PPO
- Multi-stage training orchestration
- Gradient clipping and learning rate scheduling
- KCL code generation and execution
- Feature-by-feature geometric validation
- Manufacturing constraint checking
- Chamfer distance computation
- FastAPI server with async processing
- Multiple export formats (STEP, GLTF, KCL)
- Batch processing capabilities
- Quantization for production deployment
# Clone the repository
git clone https://github.com/omira/text-to-cad.git
cd text-to-cad
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Install additional CAD tools
# macOS:
brew install opencascade
brew install freecad
# Ubuntu:
sudo apt-get install libopencascade-dev
sudo apt-get install freecadfrom src.models.text_to_cad import TextToCADModel
from src.inference.pipeline import load_model_from_checkpoint, InferencePipeline
# Load pre-trained model or create a new one
try:
model = load_model_from_checkpoint("checkpoints/final_model.pt")
except FileNotFoundError:
model = TextToCADModel(vocab_size=10000)
pipeline = InferencePipeline(model)
# Generate CAD from text
cad_sequence = pipeline.generate("Create a rectangular bracket with mounting holes")
kcl_code = pipeline.export_kcl(cad_sequence)
print(kcl_code)
# Export to STEP format
step_file = pipeline.export_step(cad_sequence, "output.step")
print(f"STEP file saved to: {step_file}")# Create dataset directories
mkdir -p data/{raw,processed,annotations}
# Download and process DeepCAD dataset
python scripts/prepare_dataset.py --download --process --annotate
# Process dataset only
python scripts/prepare_dataset.py --process --input-dir data/raw --output-dir data/processed
# Generate annotations
python scripts/prepare_dataset.py --annotate --output-dir data/processed --annotation-dir data/annotations# Train small model for prototyping
python scripts/train.py --config configs/small_config.yaml --output-dir checkpoints/small
# Train base model
python scripts/train.py --config configs/base_config.yaml --output-dir checkpoints/base
# Train large model
python scripts/train.py --config configs/large_config.yaml --output-dir checkpoints/large
# Fine-tune with visual feedback
python scripts/train.py --config configs/base_config.yaml --checkpoint checkpoints/base/final_model.pt --finetune# Run evaluation suite
python scripts/evaluate.py --model checkpoints/final_model.pt --output-dir outputs/evaluation --visualize
# Generate validation report
python scripts/evaluate.py --model checkpoints/final_model.pt --test-set data/processed/test.json --output-dir outputs/validation# Start the API server
python src/deployment/server.py --model checkpoints/final_model.pt --config configs/base_config.yaml --port 8000
# Generate example CAD from text with curl
curl -X POST "http://localhost:8000/generate" \
-H "Content-Type: application/json" \
-d '{"text": "Create a rectangular bracket with two mounting holes", "format": "step"}'This project includes pre-configured VS Code tasks for common operations:
Install Dependencies: Install required Python packagesTrain Model (Small): Train a small model for quick iterationTrain Model (Base): Train the standard modelFine-tune with Visual Feedback: Fine-tune model with CLIP-based visual feedbackRun Evaluation: Evaluate model performanceRun API Server: Start the model serving APIProcess DeepCAD Dataset: Process the raw datasetGenerate Example CAD: Run a quick CAD generation exampleRun Tests: Run test suite
- Model Architecture: Multi-modal transformer (BERT encoder + 24-layer decoder)
- Model Size: 350M parameters
- Sequence Length: Up to 512 CAD operations (768 for large model)
- Precision: 8-bit quantization with 12-bit for critical dimensions
- Training Data: DeepCAD dataset with geometric hash deduplication
- Training Stages: Pre-training + PPO fine-tuning with visual feedback
- Performance Metrics:
- Chamfer Distance: <0.87mm
- CLIP Score: >0.82
- Inference Speed: <2s for 50-operation sequences on NVIDIA A100
- Manufacturing Validity Rate: >95%
This project is licensed under the MIT License - see the LICENSE file for details.
- Accuracy: 0.87mm Chamfer Distance, 0.82 CLIP Score
βββ src/
β βββ data/ # Data processing and loading
β βββ models/ # Model architectures
β βββ training/ # Training loops and optimization
β βββ validation/ # Geometric validation
β βββ inference/ # Inference pipeline
β βββ deployment/ # Deployment utilities
βββ configs/ # Training and model configurations
βββ scripts/ # Training and evaluation scripts
βββ data/ # Dataset storage
βββ checkpoints/ # Model checkpoints
βββ outputs/ # Generated outputs and logs
βββ tests/ # Unit and integration tests
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests:
pytest tests/ - Submit a pull request
MIT License - see LICENSE file for details.
@article{omira2025textcad,
title={Text-to-CAD: A State-of-the-Art Model for Converting Natural Language to Parametric CAD},
author={OMIRA Technologies},
journal={arXiv preprint arXiv:2025.xxxxx},
year={2025}
}