-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Hi @lucaong, let me start by saying that this is not a bug, just a question which could turn into a feature request.
As mentioned in issue #67 , I'm working with large amounts of data in CubDB. Sometimes for a single key, a value can have a size of 10 megabytes or more. (Whether or not this is a good idea in itself is a good question, but out of scope for this issue ;) )
In order to do a periodic cleanup of stale data, what I want to do is list all keys currently present in my CubDB database. I noticed that this is taking quite long (a couple of seconds at least) even for a database with just 30 records (each 10MB+ in size).
My conclusion is that the reason for this is that CubDB has no way of listing only the key part of a record – the value is always loaded, too.
I've been hacking around a bit and notice that the %CubDB.Btree{} struct does seem to have the keys present. Example with some random UUID keys:
%CubDB.Btree{
root: {:l,
[
{"0687853d-0b06-4651-a5cc-855f0e8966b4", 328157201},
{"0e1f82b8-b727-4300-86de-c37368e72e4b", 315502609},
{"0e96af62-620b-4313-be63-b377499c7545", 256754705},
{"27bcd34c-3523-44b0-bb82-8294e7493ae4", 436354065},
{"430b3c18-9cd0-4682-8a23-8df8ed9252c5", 72614929},
{"459b937a-3308-40a9-8c8a-7639214e97d3", 380009489},
{"48a72808-52a3-41f7-ad8c-afa5799631db", 122859537},
{"4a8af0be-9344-4cf9-9154-28c3f2e630fd", 97831953},
{"5a87d5ba-9d4d-407d-a7e6-107a977fe5b1", 275397649},
{"6ee65e3c-50b8-46d3-b0f6-2b7380ebdb30", 296584209},
{"766025b6-4df0-4b60-b02a-ce52e2cc5823", 339930129},
{"7f459820-0eaf-421c-80b1-d84d6ccc743c", 398071825},
{"80daa26a-b5b5-402b-af82-f3eaf714045a", 170184721},
{"85087a92-f7a2-4198-b98e-b361ded90e70", 240407569},
{"8a7ce722-c5d4-4303-93f7-0776533b4765", 210489361},
{"a405a605-0f68-45d0-a3c1-5e580162b346", 123390993},
{"ae21499d-28b8-4433-8d61-005eff594908", 148443153},
{"af2dd94f-4090-4dbf-bd87-208e7a959d99", 455530513},
{"b034d60d-8a66-4eb2-9015-964a7413b063", 122861585},
{"b7874dd2-057e-4aba-aecc-1cf53b85b624", 360783889},
{"ccd09d7a-94fd-4f0f-87bd-a0e1925b2ffa", 229527569},
{"d4559626-1bda-4ad5-9735-1ac07dd97e51", 417723409},
{"d6569e73-9c3e-490b-872f-e6b6475e1ffc", 474360849},
{"ea125de7-e837-4ab6-8db7-6d14dbaeecb5", 47621137},
{"ee62ed22-8c22-4cb6-b377-68827cbcfbed", 192839697},
{"f35a6177-2e55-460f-bdef-76707e0e8b67", 1038},
{"f7d3c685-b8a2-45c8-9757-09e2b5bec8a1", 24699921},
{"f9493336-0761-4d43-acc6-fe3e9d43946d", 123109393}
]},
root_loc: 492884358,
size: 28,
dirt: 42,
store: %CubDB.Store.File{
# ... omitted
},
capacity: 32
}
While I can take this data out a Btree struct myself, it feels a bit hackish to use this internal data structure just to get a list of keys without having to load hundreds of megabytes of data into memory.
Is it correct that there is currently no public API (e.g. CubDB.keys/2, or CubDB.keys/3 for :min_key and :max_key support) available to list only the record keys without loading all values?