dsearch is a horizontally scalable, Lucene‑based distributed search engine written in Java 21.
It targets small to medium‑sized applications that need:
- Lexical search (BM25)
- Semantic search (vector kNN)
- Hybrid ranking (BM25 + embeddings)
…without the operational and conceptual overhead of a full Elasticsearch / OpenSearch cluster.
The system is composed of three primary components:
-
Gateway Node
Spring Boot HTTP API entrypoint, load balancing and routing, and system health. -
Query Nodes
gRPC services that fan‑out queries to all index nodes for a given partition, merge partial results, and apply hybrid fusion strategies. -
Index Nodes
gRPC services hosting Lucene shards. Each index node can host multiple partitions (e.g.shard-movies,shard-shows, …), and sharded across multiple nodes. Each partition is a Lucene index responsible for a categorical or domain‑specific slice of your data. -
Coordinator Node
Service discovery and health aggregation across the cluster (optional). It allows dynamic addition/removal of index/query nodes without restarting the cluster.
- Shards represent logical partitions of your data (e.g., by domain or category).
- Each shard exists once per cluster (no replicas yet).
- Shards are distributed across index nodes.
- The Gateway maintains per‑shard, per‑node document counts and uses a least‑loaded routing strategy for indexing operations:
- For a given
(partitionId, document)write/delete, the Gateway picks the index node with the smallest document count for that partition. - Over time, this keeps each shard evenly balanced across index nodes without a coordinator.
- Counts are periodically snapshotted to disk so new Gateway instances can restore their view.
- For a given
There is no replication layer yet. If an index node goes down, documents stored on that node’s shards are temporarily unavailable until the node comes back up and reloads its Lucene indices.
This design intentionally keeps the system:
- Simple to operate (few moving parts)
- Easy to reason about
- Horizontally scalable by just adding more index/query nodes and updating config.
+--------------+
| Client |
+------+-------+
|
HTTP /search
|
+--------v---------+
| Gateway Node |
| (Spring Boot API)|
+--------+---------+
|
| gRPC QueryService
|
+--------v---------+
| Query Nodes |
| - Fan-out RPCs |
| - Merge Results |
+--------+---------+
|
| gRPC IndexService
|
+-----------------------------+
| Index Nodes |
|-----------------------------|
| shard-0 | shard-1 | shard-2 |
| Lucene | Lucene | Lucene |
+-----------------------------+
The engine supports two complementary retrieval modes that can be used independently or combined:
- Lexical Search (BM25) – classic keyword relevance
- Semantic Search (Embeddings + Lucene kNN) – retrieves results by meaning, not by exact words
- Hybrid Search – fuses BM25 and semantic scores into a single ranked list
For each document:
- All textual fields are concatenated into a single representation.
- A dense embedding is generated using a default transformer model:
all-MiniLM-L6-v2via DJL (textEmbeddingmodel inapp-config.yaml).- or any other compatible model you configure.
- The document is stored in Lucene as:
- A BM25 text field for lexical search.
- A vector field (
KnnVectorField) for semantic similarity.
This enables BM25, semantic, and hybrid retrieval over the same underlying data.
- Lucene processes the query using BM25.
- Best for exact keywords, short queries, and when you care about precise term matches.
- Typically low‑latency.
- Query text is embedded using the same transformer model as indexing.
- Query Node fans out to all Index Nodes that host the requested shard/category.
- Each Index Node runs Lucene HNSW‑based kNN over its vector field.
- Results are merged and sorted by semantic similarity.
- Great for natural‑language queries and conceptual similarity (e.g., “space opera about time dilation”).
To combine lexical and semantic signals:
- Run BM25 search → top K
- Run semantic kNN search → top K
- Merge hits by document ID
- Apply a fusion strategy:
- RRF: Reciprocal Rank Fusion - rank‑based blending (default)
- score_sum: bm25Score + semanticScore
- weighted: α·bm25 + β·semantic
- Paginate the fused list and return to the client.
For full details, see Quick Start Guide.
- Java 21
- Maven 3.9+
- (Optional)
k6/ghzfor load/latency testing
# from repo root
make build
# Start a local cluster with 2 index nodes, 2 query nodes, 1 coordinator, and 1 gateway
make run-multi
# Cluster layout (by default):
# - Index Nodes : 5000, 5001
# - Query Nodes : 6000, 6001
# - Coordinator : 7000
# - Gateway : http://localhost:8080{
"query": "time travel romance",
"page": 0,
"pageSize": 10,
"partitionId": "movies",
"searchType": "HYBRID", // BM25 | SEMANTIC | HYBRID
"fusionStrategy": "RRF" // SCORE_SUM | WEIGHTED | RRF
}{
"partitionId": "movies",
"id": "movie_001",
"fields": {
"title": "Interstellar",
"content": "A team of explorers travel through a wormhole..."
}
}dsearch is instrumented end‑to‑end so you can see what your cluster is doing under load.
The Gateway uses Spring Boot Actuator + Micrometer + Prometheus registry:
-
Metrics endpoints
GET http://localhost:8080/actuator/metrics– metric catalogGET http://localhost:8080/actuator/metrics/dsearch.search.http– HTTP search handler metricGET http://localhost:8080/actuator/metrics/dsearch.index.http– HTTP index handler metricGET http://localhost:8080/actuator/prometheus– Prometheus scrape endpoint
-
Key metrics (examples):
dsearch.search.http– high‑level timing for the/api/v1/searchhandler.dsearch.gateway.search.latency{searchType,shardId}– fine‑grained latency per search type and shard.
Both Query Nodes and Index Nodes expose Prometheus‑compatible /metrics endpoints (via Prometheus Java client):
- JVM metrics (GC, memory, threads) via
DefaultExports.initialize() - gRPC server metrics via
PrometheusGrpcServerInterceptor:dsearch_grpc_server_latency_seconds{service,method,status}dsearch_grpc_server_requests_total{service,method,status}
On the Gateway side, gRPC clients are instrumented with a Prometheus gRPC client interceptor:
dsearch_grpc_client_latency_seconds{component,service,method,status}dsearch_grpc_client_requests_total{component,service,method,status}
This lets you compare client‑side vs server‑side latency per RPC method and component (e.g. gateway->query-node).
-
Gateway – aggregated health across itself and downstream nodes
GET /health– Gateway healthGET /cluster/health– Overall Cluster health
-
Query Nodes / Index Nodes – each exposes a lightweight HTTP health check endpoint
GET /health– Node health
Detailed methodology and raw results are documented in Benchmarks.
See the benchmarks document for:
k6HTTP load‑test scripts for GatewayghzgRPC benchmarks for Query Node / Index Node- How to reproduce and extend these benchmarks on your own hardware
Cluster configuration is defined in app-config.yaml and loaded into the Gateway and nodes at startup:
serviceDiscovery:
enabled: true
refreshIntervalSeconds: 30
indexNodes:
routingStrategy: "LEAST_LOADED"
componentLabel: "dsearch-index-node"
nodes:
- id: "0"
host: "localhost"
port: 5000
healthPort: 5100
queryNodes:
routingStrategy: "ROUND_ROBIN"
componentLabel: "dsearch-query-node"
nodes:
- id: "0"
host: "localhost"
port: 6000
healthPort: 6100
coordinatorNodes:
routingStrategy: "ROUND_ROBIN"
componentLabel: "dsearch-coordinator-node"
nodes:
- id: "0"
host: "localhost"
port: 7000
healthPort: 7100
ml:
models:
textEmbedding:
url: "djl://ai.djl.huggingface.pytorch/sentence-transformers/all-MiniLM-L6-v2"
engine: "PyTorch"indexNodes.routingStrategycurrently supportsLEAST_LOADED, using per‑shard, per‑node doc counts.queryNodes.routingStrategycurrently supportsROUND_ROBINfor fan‑out queries across multiple query node instances.
This project is intentionally minimal and educational. Some trade‑offs and potential future work:
-
No replication layer (yet)
- A shard lives on exactly one node; if that node goes down, its data is unavailable until restart.
- Future direction: coordinator‑driven replication / Raft‑based shard groups.
-
Basic scoring and fusion
- BM25 + Lucene kNN + RRF fusion are implemented.
- Future direction: learned ranking, per‑field boosts, filters, aggregations.
Despite these, dsearch is already a usable, horizontally scalable, Lucene‑backed search engine suitable for side projects, prototypes, and as a learning platform for distributed search architecture.
This repository is intended as an educational and portfolio project. This project is licensed under the MIT License. See the LICENSE file for details.