Thanks to visit codestin.com
Credit goes to github.com

Skip to content

How to select only keys without values? #68

@florish

Description

@florish

Hi @lucaong, let me start by saying that this is not a bug, just a question which could turn into a feature request.

As mentioned in issue #67 , I'm working with large amounts of data in CubDB. Sometimes for a single key, a value can have a size of 10 megabytes or more. (Whether or not this is a good idea in itself is a good question, but out of scope for this issue ;) )

In order to do a periodic cleanup of stale data, what I want to do is list all keys currently present in my CubDB database. I noticed that this is taking quite long (a couple of seconds at least) even for a database with just 30 records (each 10MB+ in size).

My conclusion is that the reason for this is that CubDB has no way of listing only the key part of a record – the value is always loaded, too.

I've been hacking around a bit and notice that the %CubDB.Btree{} struct does seem to have the keys present. Example with some random UUID keys:

%CubDB.Btree{
  root: {:l,
   [
     {"0687853d-0b06-4651-a5cc-855f0e8966b4", 328157201},
     {"0e1f82b8-b727-4300-86de-c37368e72e4b", 315502609},
     {"0e96af62-620b-4313-be63-b377499c7545", 256754705},
     {"27bcd34c-3523-44b0-bb82-8294e7493ae4", 436354065},
     {"430b3c18-9cd0-4682-8a23-8df8ed9252c5", 72614929},
     {"459b937a-3308-40a9-8c8a-7639214e97d3", 380009489},
     {"48a72808-52a3-41f7-ad8c-afa5799631db", 122859537},
     {"4a8af0be-9344-4cf9-9154-28c3f2e630fd", 97831953},
     {"5a87d5ba-9d4d-407d-a7e6-107a977fe5b1", 275397649},
     {"6ee65e3c-50b8-46d3-b0f6-2b7380ebdb30", 296584209},
     {"766025b6-4df0-4b60-b02a-ce52e2cc5823", 339930129},
     {"7f459820-0eaf-421c-80b1-d84d6ccc743c", 398071825},
     {"80daa26a-b5b5-402b-af82-f3eaf714045a", 170184721},
     {"85087a92-f7a2-4198-b98e-b361ded90e70", 240407569},
     {"8a7ce722-c5d4-4303-93f7-0776533b4765", 210489361},
     {"a405a605-0f68-45d0-a3c1-5e580162b346", 123390993},
     {"ae21499d-28b8-4433-8d61-005eff594908", 148443153},
     {"af2dd94f-4090-4dbf-bd87-208e7a959d99", 455530513},
     {"b034d60d-8a66-4eb2-9015-964a7413b063", 122861585},
     {"b7874dd2-057e-4aba-aecc-1cf53b85b624", 360783889},
     {"ccd09d7a-94fd-4f0f-87bd-a0e1925b2ffa", 229527569},
     {"d4559626-1bda-4ad5-9735-1ac07dd97e51", 417723409},
     {"d6569e73-9c3e-490b-872f-e6b6475e1ffc", 474360849},
     {"ea125de7-e837-4ab6-8db7-6d14dbaeecb5", 47621137},
     {"ee62ed22-8c22-4cb6-b377-68827cbcfbed", 192839697},
     {"f35a6177-2e55-460f-bdef-76707e0e8b67", 1038},
     {"f7d3c685-b8a2-45c8-9757-09e2b5bec8a1", 24699921},
     {"f9493336-0761-4d43-acc6-fe3e9d43946d", 123109393}
   ]},
  root_loc: 492884358,
  size: 28,
  dirt: 42,
  store: %CubDB.Store.File{
    # ... omitted
  },
  capacity: 32
}

While I can take this data out a Btree struct myself, it feels a bit hackish to use this internal data structure just to get a list of keys without having to load hundreds of megabytes of data into memory.

Is it correct that there is currently no public API (e.g. CubDB.keys/2, or CubDB.keys/3 for :min_key and :max_key support) available to list only the record keys without loading all values?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions