A production-grade, fault-tolerant distributed key-value store built in Go. This project provides a horizontally scalable storage solution that ensures data consistency and high availability using the Raft consensus algorithm. It features a modern Svelte UI for real-time cluster management and data exploration.
- Distributed CRUD Operations: Simple
PUT,GET, andDELETEoperations distributed across a multi-node cluster. - Strong Consistency: Guarantees data consistency across all nodes using the Raft consensus protocol. All writes are committed by a leader and replicated to a majority of nodes.
- High Availability & Fault Tolerance: The system can tolerate node failures. If a leader node fails, the cluster automatically elects a new leader with no data loss.
- Horizontal Scalability: Easily scale the cluster by adding new nodes. The system is designed to handle new nodes joining a live cluster.
- Persistent Storage: Utilizes BoltDB for durable, on-disk storage with ACID guarantees, ensuring data survives node restarts.
- Live Management Dashboard: A modern, real-time web UI built with Svelte allows you to:
- View the status of all nodes (leader, follower, online/offline).
- Add new nodes to the cluster dynamically.
- Stop, restart, and decommission nodes.
- Explore and manage key-value data directly.
- Go: The core language for the backend application.
- HashiCorp Raft: The underlying library for implementing the Raft consensus algorithm, providing leader election and replicated log management.
- Gin: A high-performance HTTP web framework for the API.
- BoltDB: An embedded, persistent key/value database for Go.
- HashiCorp Memberlist: A library for cluster membership and node discovery using a gossip-based protocol.
- SvelteKit: A powerful UI framework for building fast, modern web applications.
- Tailwind CSS: A utility-first CSS framework for rapid UI development.
- Vite: The next-generation frontend tooling for a blazing-fast development experience.
This project was built to be exceptionally resilient. Significant effort has gone into identifying and fixing complex distributed systems bugs to make the cluster as foolproof as possible.
A critical failure mode in distributed systems is "split-brain," where a network partition or faulty recovery logic causes two separate leaders to emerge, leading to data inconsistency.
- Solved: The cluster launcher is now intelligent. It will never bootstrap a new node as a leader if a cluster configuration already exists (even if all nodes are offline). This ensures that a new, independent cluster can't be accidentally created, forcing all nodes to reconcile with the original cluster's state.
The system correctly handles losing a majority of its nodes (quorum loss), which is the point where a leader can no longer be elected.
- Solved: When quorum is lost, the remaining nodes safely transition to a follower state and reject writes, preventing data divergence. The system provides a clear recovery path: simply restart one of the original offline nodes to re-establish a majority and automatically elect a new leader.
In a dynamic cluster, a new node might try to join while the leader is temporarily unavailable or in the middle of an election.
- Solved: A new node now persistently retries its join request in the background. It will continue attempting to connect every few seconds until it can successfully reach the cluster's leader. It's also smart enough to handle redirects if it initially contacts a follower node, ensuring it always finds its way to the true leader.
- Go (version 1.21 or newer)
- Node.js and npm (for building the UI)
-
Clone the Repository
git clone <your-repo-url> cd distributed-key-value-store
-
Build the Frontend UI This step compiles the Svelte application into static assets that the Go backend will serve.
cd ui-svelte npm install npm run build cd ..
-
Tidy and Run the Cluster Launcher From the project's root directory, tidy the Go modules and then run the launcher. It will automatically build the server binary, start a 3-node cluster, and host the UI.
go mod tidy go run ./cmd/kv-launcher/main.go
-
Open the Dashboard Once the launcher is running, open your browser and navigate to: http://localhost:8080
While the UI is the primary way to interact with the cluster, you can also use the REST API directly with tools like curl.
These endpoints can be called on any active node in the cluster. If you call a follower node, your request will be automatically forwarded to the leader. Replace 9000 with the API port of any online node.
Stores or updates a value for a given key.
- Method:
PUT - Path:
/kv/:key - Body: JSON object with a
valuefield.
curl -X PUT -H "Content-Type: application/json" \
-d '{"value":"is a distributed systems algorithm"}' \
http://localhost:9000/kv/raftRetrieves the value for a given key.
- Method:
GET - Path:
/kv/:key
curl http://localhost:9000/kv/raftRemoves a key and its value from the store.
- Method:
DELETE - Path:
/kv/:key
curl -X DELETE http://localhost:9000/kv/raftRetrieves all key-value pairs from the store. Note: This reads from the local state of the queried node.
- Method:
GET - Path:
/kv
curl http://localhost:9000/kvThese endpoints are called on the launcher application (running on port 8080) to manage the node processes and the cluster's configuration.
Returns a list of all configured nodes and their current status (online/offline, leader/follower).
- Method:
GET - Path:
/api/control/nodes
curl http://localhost:8080/api/control/nodesDynamically adds a new node to the cluster. The launcher assigns it the next available ports, starts the process, and instructs it to join the cluster leader.
- Method:
POST-- Path:/api/control/add - Body: JSON object with a
node_idfield.
curl -X POST -H "Content-Type: application/json" \
-d '{"node_id":"node5"}' \
http://localhost:8080/api/control/addStops a running node process gracefully.
- Method:
POST - Path:
/api/control/stop/:nodeID
curl -X POST http://localhost:8080/api/control/stop/node5Restarts a stopped node. The node will attempt to rejoin the cluster.
- Method:
POST - Path:
/api/control/start/:nodeID
curl -X POST http://localhost:8080/api/control/start/node5Permanently removes a node from the cluster configuration. The launcher will tell the leader to remove the node from the consensus group, stop the process, and delete its data directory.
- Method:
POST - Path:
/api/control/delete/:nodeID
curl -X POST http://localhost:8080/api/control/delete/node5