Disclaimer

I am not a Rust developer nor a database/KV store systems level engineer. This repo is simply a personal project for the purposes of learning both topics. Or at least, to learn them at a surface level.

Note that I am also implementing this with only a surface level of research into existing KV stores because I explicitly want to explore the limits of my own problem solving as opposed to just googling "what does memcached/redis do for keyspace sharding". Of course I am interested in learning about those kinds of things, and I've got a stash of papers and blog posts that I am slowly working through, but the goal here is to see how far I can get with my existing skillset and what comes out of that process.

Current Design Decisions

usage of Tokio for async runtime since it's so widely used in the Rust ecosystem; tokio tasks are lightweight (compared to Go's goroutines, which is where I am coming from) and allow easily supporting concurrent processing of tasks as the KV store receives requests
- tokio's runtime has a work-stealing scheduler and allows for configuration of it's threadpool size if desired
no horizontal sharding plans; as a personal project properly testing and benchmarking a single process is complicated enough, and I am explicitly intersted in explorying systems level programming techniques as my professional background thus far has been in horizontal distributed computing
- the KV store takes in a sharding configuration option, for sharding of the keyspace, which should improve the read and write throughput
- key TTL cleanup is done via a secondary tokio task per shard, the expiration check interval is also configurable
  - the goal here was to provide a cleanup mechanism which does use a "stop the world" GC sweep-esque mechanism, but rather the routine is notified of a key and it's TTL via the same path that results in the key being stored in the shards actual hash map
  - with this, the cleanup routine itself knows which keys it should send delete signals for, and the delete is managed the same as an external delete request to maintain separation of concerns
  - in the future we may want to some kind of secondary cleanup that (much less frequently than the current routine) periodically checks for expired keys that were somehow missed by their cleanup coroutines (currently there's no mechanism for detecting the cleanup task dying)

Future Improvements

LRU cache eviction: I have a naive implementation sketched out in my notebook which I want to try implementing, where the current shard_routine mechanism would be extended into a struct that contains both the shards hash map as well as a queue like representation of the LRU access for the shards keys. The goal here was to avoid introducing a single coordinator like process for managing the global LRU list for the KV store. The plan is that; each time a key is accessed, it would be removed from it's current location in the LRU representation and then prepended to the front. Then, during a SET operation if we detect that we need to evict a key we simply need to O(1) lookup the last element in each shards LRU represenatation rather than doing a sequential scan
Slab memory allocation of value storage space: the one optimization from memcached that I know I do want to explore implementing is it's slab allocation. Here, we continue to have a hash map for key lookups, but rather than the map giving us the value directly it gives us a struct with some metadata and a pointer into a memory slab. Each slab is made up of chunks of predefined sizes, and each slab is managing chunks of a single size. So we may have a slab that stores items of max 256B, another that does 1KB or less, and so on. With this, we make more efficient use of memory by a) not having to alloc/free each time we SET/DELETE something, as well as more efficently use our overall allocation of memory by reducing external fragmentation (empty space between allocated memory, some small sections unusuable) by trading off for bounded memory usage (accepting the inefficiency of internal fragmentation, or "wasted" space within an allocated block when we store an item of 100B in a chunk that's allocated 128B). At the moment I have not decided how this will play into my existing sharding of the keyspace.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE.md		LICENSE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Disclaimer

Current Design Decisions

Future Improvements

About

Uh oh!

Releases

Packages

Languages

License

cstyan/tetsukv

Folders and files

Latest commit

History

Repository files navigation

Disclaimer

Current Design Decisions

Future Improvements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages