Benchmark framework for comparing Milvus and Weaviate vector databases on Kubernetes.
git clone https://github.com/hungngodev/VectorDBBench.git
cd VectorDBBench
pip install poetry && poetry installdocker build -t hungngodev/vectordbbench:latest .
docker push hungngodev/vectordbbench:latestDownload datasets to shared NFS storage:
./prepare_datasets.sh /mnt/nfs/shared/datasetsexport NS=marco
export HOST_DATA_DIR=/mnt/nfs/shared/datasets
export HOST_RESULTS_DIR=/mnt/nfs/shared/results
export CASE_TYPE=Performance768D1M # or Performance768D100K
./scripts/run_config_matrix.shpython scripts/aggregate_results.py --dir /mnt/nfs/shared/results --output analysis/all_results.csv
cd analysis && python generate_figures.pyexport KUBECONFIG=/path/to/kubeconfig
kubectl config use-context swarm
kubectl get pods -n marcoDatabases are deployed in the marco namespace:
| Database | Service URL | Port | Configuration |
|---|---|---|---|
| Milvus | milvus.marco.svc.cluster.local |
19530 | Distributed architecture, 1 querynode |
| Weaviate | weaviate.marco.svc.cluster.local |
8080 | Single monolithic instance |
Note: Although Milvus uses a distributed architecture (separate coordinator, data node, index node, query node), we run with 1 querynode for fair comparison with Weaviate's single instance.
In the Raft scaling experiment, Weaviate was deployed as a 3-node Raft cluster while Milvus remained at 1 querynode.
Key findings:
- Weaviate's Raft consensus provides fault tolerance only, not search parallelism
- Each search query is still processed by a single node
- Load balancing must be implemented separately (e.g., via Kubernetes Ingress or a custom load balancer)
- Milvus's querynode can be independently scaled for search parallelism
Milvus (Helm values):
# View current config
helm get values milvus -n marco
# Update querynode replicas (NOTE: kept at 1 for fair comparison)
helm upgrade milvus milvus/milvus -n marco --set queryNode.replicas=1Weaviate (Helm values):
# View current config
helm get values weaviate -n marco
# Update replica count (Raft consensus for fault tolerance)
helm upgrade weaviate semitechnologies/weaviate -n marco --set replicas=3To modify HNSW parameters (M, efConstruction, efSearch), edit the benchmark scripts:
# In scripts/run_config_matrix.sh
M_VALUES="4 8 16 32 64 128"
EF_VALUES="128 192 256 384 512 768"# Check pod status
kubectl get pods -n marco -w
# View logs
kubectl logs -f deployment/milvus-querynode -n marco
# Resource usage
kubectl top pods -n marcoBenchmark results and analysis are in the analysis/ directory:
RESEARCH_REPORT_v2.md- Performance comparison reportall_results_*.csv- Raw benchmark data*.png- Visualization figures
scripts/run_all_nohup.sh - Run full benchmark detached (recommended):
HOST_DATA_DIR=/mnt/nfs/shared/datasets \
HOST_RESULTS_DIR=/mnt/nfs/shared/results \
CPU=16 MEM=64Gi \
bash scripts/run_all_nohup.shLogs written to run_all.log. Monitor with tail -f run_all.log.
scripts/run_config_matrix.sh - Core benchmark runner with configurable parameters:
| Environment Variable | Default | Description |
|---|---|---|
NS |
marco |
Kubernetes namespace |
HOST_DATA_DIR |
(empty) | Path to cached datasets on NFS |
HOST_RESULTS_DIR |
(empty) | Path to save results on NFS |
CASE_TYPE |
Performance768D1M |
Benchmark case (Performance768D100K, Performance768D1M) |
K |
100 |
Number of nearest neighbors to retrieve |
EF_CONSTRUCTION |
360 |
HNSW efConstruction (fixed for index quality) |
NUM_CONCURRENCY |
1,2,4,8,16,32 |
Client concurrency levels |
CONCURRENCY_DURATION |
60 |
Seconds per concurrency level |
CPU / MEM |
16 / 64Gi |
Pod resource limits |
HNSW Parameter Matrices (edit in script):
# Milvus/Weaviate: M and efSearch values
milvus_m=(4 8 16 32 64 128 256)
milvus_ef=(128 192 256 384 512 640 768 1024)
weav_m=(4 8 16 32 64 128 256)
weav_ef=(128 192 256 384 512 640 768 1024)scripts/run_all_and_cleanup.sh
Orchestrates the entire benchmark pipeline:
- Runs
run_config_matrix.shto execute all benchmark jobs - Calls
aggregate_results.pyto combine JSON results into CSV - Cleans up individual JSON files after aggregation
NS=marco RESULT_ROOT=/mnt/nfs/shared/results OUTPUT=all_results.csv \
bash scripts/run_all_and_cleanup.shscripts/aggregate_results.py
Combines individual JSON result files into a single CSV for analysis.
python scripts/aggregate_results.py --root /mnt/nfs/shared/results --output all_results.csvOutput columns: db, task_label, concurrency, qps, latency_p99, recall, load_duration, etc.
scripts/cleanup_bench.sh
Deletes all benchmark jobs and pods (prefixed with vdb- or vectordb-bench) from the cluster.
NS=marco bash scripts/cleanup_bench.shscripts/stop_and_clean.sh
Emergency stop: kills local benchmark scripts AND deletes all Kubernetes jobs in marco namespace.
bash scripts/stop_and_clean.sh