Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Storage Efficiency

Ben Alex edited this page Jun 4, 2020 · 7 revisions

Introduction

We often receive questions about how LMDB stores data, especially when disk space consumption is being reported at higher levels than would be expected. This page tries to explain a little more about what is going on and how to optimise the storage efficiency.

Default Environments

To help make this wiki page more informative, let’s use the Verifier that ships with LmdbJava so we have a database to look at. The Verifier is useful for proving that LMDB is operating properly on a given system, and for our purposes provides a convenient way to create a database containing several Dbis with lots of Txns and a variety of insert, delete and read use cases.

To run the Verifier we’ll use the following code:

    final File path = new File("lmdb");
    final File copy = new File("lmdb-copy");
    try (Env<ByteBuffer> env = create()
        .setMaxReaders(1)
        .setMaxDbs(Verifier.DBI_COUNT)
        .setMapSize(MEBIBYTES.toBytes(50))
        .open(path, MDB_NOSUBDIR)) {
      final Verifier v = new Verifier(env);
      final long r = v.runFor(30, SECONDS);
      System.out.println("Records verified: " + r);
      System.out.println(env.info());
      System.out.println(env.stat());
      env.copy(copy, MDB_CP_COMPACT);
    }

When run we receive the following output:

Records verified: 3521
EnvInfo{lastPageNumber=276, lastTransactionId=60, mapAddress=0, mapSize=52428800, maxReaders=1, numReaders=1}
Stat{branchPages=0, depth=1, entries=5, leafPages=1, overflowPages=0, pageSize=4096}

Verifier currently works by writing out a record that comprises of an incrementing long key and a 64 KiB random value, then deleting the prior record. In this run we created 3521 records, of which 3520 would have been deleted. Note that entries=5 indicates there are 5 Dbis in the environment (not 5 records). Because Verifier cycles through a new Txn every 64 records, so we should see around 55 transactions (3521 / 64). Indeed we see a slightly-higher value of lastTransactionId=60 because there were also 5 transactions to create the Dbis. When our database is closed, it should contain only one record, and that one record should be around 64 KiB .

Let’s have a look at the size of the resulting database files, starting with ls:

$ ls -lh lmdb lmdb-copy
-rw-r----- 1 bpa bpa 1.1M May 22 12:47 lmdb
-rw-r----- 1 bpa bpa  80K May 22 12:47 lmdb-copy

You will note our first surprise: the lmdb-copy is only 80 KiB, but the main database we worked with, lmdb, is roughly 14 times larger. That’s because while most of the data was deleted as the Verifier went along, LMDB still needed enough space to handle the Txn.

The du output tells us the same as ls in this case:

$ du -h lmdb lmdb-copy
1.1M	lmdb
80K	lmdb-copy

Now we’ll look at stat, which lets us see the blocks used. Our reason for including this now is it will be useful for subsequent comparison with a memory-mapped file:

$ stat lmdb | head -n 2
  File: lmdb
  Size: 1105920   	Blocks: 2168       IO Block: 4096   regular file

Memory-Mapped Environments

Let’s change the way we create the database to use a memory-mapped file. This will result in LMDB allocating a sparse file to hold the data. This usually provides performance gains by reducing allocations (at the expense of some safety). To use a memory-mapped file we open the Env with an additional flag: EnvFlags.MDB_WRITEMAP.

After deleting the old databases and running again with MDB_WRITEMAP we receive:

Records verified: 2049
EnvInfo{lastPageNumber=276, lastTransactionId=37, mapAddress=0, mapSize=52428800, maxReaders=1, numReaders=1}
Stat{branchPages=0, depth=1, entries=5, leafPages=1, overflowPages=0, pageSize=4096}

Here we see we only processed 2049 records, which was materially fewer than the earlier (non-memory-mapped) approach. But our goal here is to look at disk space – not benchmark the various Env flags – so we’ll just look at the broader report.

It’s interesting to note that despite far few records being processed and correspondingly fewer transactions (37), the last page number is once again 276. Let’s take a look at ls again:

$ ls -lh lmdb lmdb-copy
-rw-r----- 1 bpa bpa 50M May 22 13:14 lmdb
-rw-r----- 1 bpa bpa 80K May 22 13:14 lmdb-copy

The lmdb-copy is the same size as our earlier (non-memory-mapped) copy. This is exactly what you’d expect given the database should end up with only a single 64 KiB record at the end. But what is going on with the lmdb database? It’s gone from 1.1 MiB to 50 MiB. This is because it’s a sparse file and ls reports the allocated size (not the consumed size).

If we use du, we discover the actual space consumed by the sparse file is 1.1 MiB. This is the same size as our earlier run without a sparse file:

$ du -h lmdb
1.1M	lmdb

We can verify this further using stat. While it reports the 50 GiB file, it only uses 2216 blocks (very close to the 2168 blocks reported on our earlier run):

$ stat lmdb | head -n 2
  File: lmdb
  Size: 52428800  	Blocks: 2216       IO Block: 4096   regular file

Minimising Space Consumption

There are some techniques available to reduce the size of LMDB databases.

The starting point is to minimise the amount of data you are asking LMDB to store. This is discussed in detail on the Value Serialisation and Compression page.

As discussed on the Concurrency page, due to MVCC it is important to minimise the duration of any write Txn while there are read Txns in flight.

If your workload allows you to insert data in key order, you should always do this. LMDB has internal optimisations that detect sequential inserts and assume these will continue, meaning it biases page splits in a manner that is more space efficient.

If you are inserting data in sequential order and will not be adding random data later on, use PutFlags.MDB_APPEND. This prevents page splits, leading to greater space efficiency. If you add random data later on it will be slower due to the required page split.

To recover space you can perform a copy with MDB_CP_COMPACT like we saw earlier. If a database requires no further modifications (as is often the case with time series or archival data) you can save space by generating a new database in key order.

If you need to store larger values, it’s worth considering LMDB’s page allocation approach. A page is usually 4096 bytes and requires a 16 byte header. This leaves 4080 bytes for records. Each record consists of an 8 byte overhead plus the user data (ie length of the user-provided key and value). With a minimum of 2 records per page, the optimal value for the user-provided key and value is 2032 bytes (ie (4080 / 2) – 8). If records are larger than this, they spill to overflow pages. Larger records should provide user data in 4096 byte increments, less 16 bytes for the initial header.

Conclusion

We have seen that LMDB storage consumption can be examined using several approaches, including Env.info(), Env.stat(), ls, du and stat. Each tool offers a different insight into the space being consumed and why. If asking for consumption-related advice on the issue tracker, please include the output of these tools.

There are always storage consumption benefits by reducing concurrent Txns, reducing the size of the records you’re asking LMDB to store, and inserting data in order of key. We’ve also touched on some of the page sizing internals if you are storing larger values.

Clone this wiki locally