Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Improve Java API get() performance by reducing copies#10970

Closed
alanpaxton wants to merge 14 commits into
facebook:mainfrom
alanpaxton:eb-1631-get-performance
Closed

Improve Java API get() performance by reducing copies#10970
alanpaxton wants to merge 14 commits into
facebook:mainfrom
alanpaxton:eb-1631-get-performance

Conversation

@alanpaxton
Copy link
Copy Markdown
Contributor

@alanpaxton alanpaxton commented Nov 21, 2022

Performance improvements for get() paths in the RocksJava API (JNI).
Document describing the performance results.

Replace uses of the legacy DB::Get() method wrapper returning data in a std::string with direct calls to DB::Get() passing a pinnable slice to receive this data. Copying from a pinned slice direct to the destination java byte array, without going via an intervening std::string, is a major performance gain for this code path.

Note that this gain only comes where DB::Get() is able to return a pinned buffer; where it has to copy into the buffer owned by the slice, there is still the intervening copy and no performance gain. It may be possible to address this case too, but it is not trivial.

@alanpaxton alanpaxton marked this pull request as draft November 21, 2022 10:40
@alanpaxton alanpaxton force-pushed the eb-1631-get-performance branch 3 times, most recently from e20f0b0 to 9b758fa Compare November 24, 2022 09:12
@alanpaxton alanpaxton marked this pull request as ready for review November 24, 2022 09:27
@alanpaxton alanpaxton force-pushed the eb-1631-get-performance branch from 5763550 to 8896a47 Compare December 5, 2022 08:33
@adamretter adamretter self-requested a review December 5, 2022 16:18
Copy link
Copy Markdown
Collaborator

@adamretter adamretter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adamretter
Copy link
Copy Markdown
Collaborator

Hi @ajkr @siying, could you take a look at importing/merging this one please?

Benchmarks are a subset of facebook#7597

We just want to benchmark existing JNI methods with some potential performance improvements.
This should reduce by the 1 the number of value copies when the result is in the buffer cache, and can be returned by a PinnableSlice - in which case we can directly copy from the slice into the result Java byte[]
Add baseline benchmark results
Fix instructions. Gather next set of results.
Discussion
@alanpaxton alanpaxton force-pushed the eb-1631-get-performance branch from 8896a47 to edf18f5 Compare December 16, 2022 10:06
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Comment thread java/GetBenchmarks.md
GetBenchmarks.preallocatedByteBufferGet no_column_family 1000 128 N/A 32768 thrpt 25 70765.578 ± 697.548 ops/s
GetBenchmarks.preallocatedGet no_column_family 1000 128 N/A 32768 thrpt 25 69883.554 ± 944.184 ops/s

### After fixing byte[] (.get and .preallocatedGet)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .get and .preallocatedGet are benchmark functions. I'd find it easier to conclude there's a user-facing improvement if the same (optimized) version of java/jmh were used for both sides of the comparison.

In any case, thanks for the detailed instructions - I was able to try that and indeed found an improvement, especially with large / numerous values. Unlike the results here, mine benefited more from numerous values because PinnableSlice helps with reads from SST files more than it helps with reads from memtable. And the improvements were similar across the different benchmarks.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record I saw ~15% faster point lookup with 100K keys, 64KB values, 12-byte keys.

Command (not trying to use much long running features for stability): java -jar java/jmh/target/rocksdbjni-jmh-1.0-SNAPSHOT-benchmarks.jar -wi 1 -f 1 -p keyCount=100000 -p keySize=12 -p valueSize=65536 -p columnFamilyTestType="no_column_family" org.rocksdb.jmh.GetBenchmarks

Control (java/jmh built from PR10970; rocksdbjni.jar built from main) result:

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score      Error  Units
GetBenchmarks.get                              no_column_family      100000         12        65536  thrpt    5  29633.553 ±  438.256  ops/s
GetBenchmarks.preallocatedByteBufferGet        no_column_family      100000         12        65536  thrpt    5  50115.108 ± 2197.008  ops/s
GetBenchmarks.preallocatedGet                  no_column_family      100000         12        65536  thrpt    5  44642.995 ± 2717.312  ops/s

Experimental (java/jmh and rocksdbjni.jar built from PR10970) result:

Benchmark                                (columnFamilyTestType)  (keyCount)  (keySize)  (valueSize)   Mode  Cnt      Score      Error  Units
GetBenchmarks.get                              no_column_family      100000         12        65536  thrpt    5  34735.884 ± 1569.739  ops/s
GetBenchmarks.preallocatedByteBufferGet        no_column_family      100000         12        65536  thrpt    5  58898.514 ± 5975.174  ops/s
GetBenchmarks.preallocatedGet                  no_column_family      100000         12        65536  thrpt    5  53611.361 ± 1280.867  ops/s

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Andrew. I retried and failed to reproduce those spectacular Mac results, and I'm not sure exactly what happened as I was building both this branch and the "baseline" branch I used in tests in consistent fashion.
Anyway, as your work and the Ubuntu tests confirm we have a small but useful improvement.

And thanks for the explanation re pinnable-slices and memtable vs SSTs. That's helpful for my understanding.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ajkr merged this pull request in f8969ad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants