3D Asset Retrieval Demo

A simple multi-modal 3D asset retrieval system. This repo uses objaverse as a simple demonstration, while it can generalize and scale to any 3D dataset.

1. Features

Demo built upon objaverse, using Cap3D 650k+ english captions, 650k+ translated captions ,and 260k+ gobjaverse asset renderings

Text Search: Retrieve 3D assets in English or Chinese
Image Search: Retrieve 3D assets using a single RGB image
Cross-Modal Retrieval: Retrieve 3D assets using text2image or image2text similarities
Dual Algorithms:
- SigLip: Fast, English-only, per-image embeddings. Medium Retrieval Quality(WIP).
- Qwen3-VL-Embedding: Bilingual, multi-image embeddings. High Retrieval Quality(Recommended).
Vector Database: PostgreSQL with pgvector for efficient similarity search
Web Interface: Beautiful Gradio UI with 3D model viewer
REST API: FastAPI backend for programmatic access

2. Architecture

┌─────────────────┐
│ English Captions│
└────────┬────────┘
         │
         ├──► Translation ──► Chinese Captions
         │
         ├──► SigLip Embeddings ──┐
         │    - Text (EN)          │
         │    - Images (per-view)  │
         │                         │
         └──► Qwen Embeddings ─────┤
              - Text (EN + CN)     │
              - Images (multi)     │
                                   │
                                   ▼
                         ┌──────────────────┐
                         │ PostgreSQL       │
                         │ + pgvector       │
                         └────────┬─────────┘
                                  │
                         ┌────────▼─────────┐
                         │  FastAPI Backend │
                         └────────┬─────────┘
                                  │
                         ┌────────▼─────────┐
                         │  Gradio Frontend │
                         └──────────────────┘

3. Get Started

Prerequisites

Python 3.8+
PostgreSQL 12+ with pgvector extension
NVIDIA GPU (recommended for SigLip)
DashScope API key (for Qwen)

Installation

Clone the repository

git clone https://github.com/3DSceneAgent/AssetRetrieval3D

Install dependencies:

# optionally create new env 
# conda create -n asset_retrieval python=3.10
# conda activate asset_retrieval
pip install -r requirements.txt

Set up environment variables:

cp .env.example .env 
# setup variables
export DASHSCOPE_API_KEY="your-api-key"
export DB_HOST="localhost"
export DB_PORT="5432"
export DB_USER="postgres"
export DB_PASSWORD="your-password"

Configure PostgreSQL:
- Install PostgreSQL
- Install pgvector extension:
```
CREATE EXTENSION vector;
```

4. Usage

Step 0: Configure Settings

Edit config.py to adjust settings:

MAX_ASSETS: Set to a small number (e.g., 100) for testing
Database credentials
API keys
Paths

Step 1: Translate Captions (Optional but Recommended)

Translate English captions to Chinese:

python scripts/01_translate_captions.py

This creates data/text_captions_cap3d_cn.json.

Note: This uses the Qwen batch API and may take hours for the full dataset.

Step 2: Generate Embeddings

(Optional) SigLip Embeddings

python scripts/02_embed_siglip.py

This generates:

Text embeddings (English only)
Image embeddings (one per viewpoint)

(Optional) Qwen Embeddings

python scripts/03_embed_qwen.py

This generates:

Text embeddings (English and Chinese)
Multi-image embeddings (8 images per asset)

Note: Can run both in parallel if you have resources.

Step 3: Populate Database

python scripts/04_populate_database.py

This:

Creates two databases: siglip_embeddings and qwen_embeddings
Creates tables with pgvector columns
Inserts all embeddings
Creates vector indexes

Step 4: Start Backend API

python backend/app.py

Or with uvicorn:

uvicorn backend.app:app --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000

Step 5: Start Frontend

In a new terminal:

python frontend/gradio_app.py

The UI will be available at http://localhost:7860

Step 6: Test Backend with Client Script

A comprehensive test client is provided to verify the backend service:

# Run all tests with default settings
python test_client.py

# Test only SigLip algorithm
python test_client.py --algorithm siglip

# Test only image search
python test_client.py --test-type image

# Run with verbose output (shows all results)
python test_client.py --verbose --num-tests 1

# Test against remote backend
python test_client.py --backend-url http://remote-server:8000

The test client will:

Check backend health
Test text search (English and Chinese for Qwen)
Test image search using real assets from the database
Test both uni-modal and cross-modal search
Display results with similarity scores and captions /search/text

5. Configuration Options

`config.py` Key Settings

MAX_ASSETS: Limit number of assets to process (for debugging)
TRANSLATION_BATCH_SIZE: Captions per translation batch (default: 1000)
EMBEDDING_BATCH_SIZE: Batch size for embedding generation (default: 100)
QWEN_NUM_IMAGES: Number of images per asset for Qwen (default: 8)
DEFAULT_TOP_K: Default number of search results (default: 10)

6. Algorithms

Embedding Algoirthms

Feature	SigLip	Qwen
Text Languages	English only	English + Chinese
Image Embeddings	One per viewpoint	Multi-image (8 views)
Speed	Fast	Slower (API calls)
Requires GPU	Yes (local)	No (API)
Cross-modal	Yes	Yes

Search Modes

Inner-Modal Search

Text → Text: Find assets with similar descriptions
Image → Image: Find visually similar assets

Cross-Modal Search

Text → Image: Find images matching text description
Image → Text: Find text descriptions matching image

7. File Structure

objaverse_retrieval/
├── config.py                 # Configuration
├── requirements.txt          # Dependencies
├── README.md                # This file
├── test_client.py           # Backend test client
│
├── data/                    # Data files
│   ├── text_captions_cap3d.json
│   ├── text_captions_cap3d_cn.json  (generated)
│   ├── gobjaverse/          # 3D asset images
│   └── gobjaverse_280k_index_to_objaverse.json
│
├── outputs/                 # Generated outputs
│   ├── embeddings/          # Saved embeddings
│   ├── translations/        # Translation results
│   └── batch_jsonl/         # API batch files
│
├── utils/                   # Utility modules
│   ├── data_loader.py
│   ├── image_utils.py
│   └── db_utils.py
│
├── scripts/                 # Processing scripts
│   ├── 01_translate_captions.py
│   ├── 02_embed_siglip.py
│   ├── 03_embed_qwen.py
│   └── 04_populate_database.py
│
├── backend/                 # FastAPI backend
│   ├── app.py
│   ├── embedding_service.py
│   └── vector_search.py
│
└── frontend/               # Gradio frontend
    └── gradio_app.py

License

The code and application is licensed under Apache2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.cursor/plans		.cursor/plans
assets		assets
backend		backend
data		data
docs		docs
frontend		frontend
scripts		scripts
utils		utils
.cursorignore		.cursorignore
.env.example		.env.example
.gitignore		.gitignore
.ruff.toml		.ruff.toml
LICENSE		LICENSE
README.md		README.md
README_zh.md		README_zh.md
config.py		config.py
requirements.txt		requirements.txt
run_backend.sh		run_backend.sh
run_frontend.sh		run_frontend.sh
setup_database.sh		setup_database.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

3D Asset Retrieval Demo

1. Features

2. Architecture

3. Get Started

Prerequisites

Installation

4. Usage

Step 0: Configure Settings

Step 1: Translate Captions (Optional but Recommended)

Step 2: Generate Embeddings

(Optional) SigLip Embeddings

(Optional) Qwen Embeddings

Step 3: Populate Database

Step 4: Start Backend API

Step 5: Start Frontend

Step 6: Test Backend with Client Script

5. Configuration Options

`config.py` Key Settings

6. Algorithms

Embedding Algoirthms

Search Modes

Inner-Modal Search

Cross-Modal Search

7. File Structure

License

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

3DSceneAgent/AssetRetrieval3D

Folders and files

Latest commit

History

Repository files navigation

3D Asset Retrieval Demo

1. Features

2. Architecture

3. Get Started

Prerequisites

Installation

4. Usage

Step 0: Configure Settings

Step 1: Translate Captions (Optional but Recommended)

Step 2: Generate Embeddings

(Optional) SigLip Embeddings

(Optional) Qwen Embeddings

Step 3: Populate Database

Step 4: Start Backend API

Step 5: Start Frontend

Step 6: Test Backend with Client Script

5. Configuration Options

config.py Key Settings

6. Algorithms

Embedding Algoirthms

Search Modes

Inner-Modal Search

Cross-Modal Search

7. File Structure

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`config.py` Key Settings

Packages