Improve Efficiency Of Chunk #3831

adamgfraser · 2020-06-19T01:52:38Z

Following up on discussion on Discord, this PR improves the efficiency of operations on concatenated chunks. The basic problem is that right now almost all chunk operations are implemented in terms of accessing each index of the chunk. Indexed access is relatively efficient because of balanced concatenation and using underlying arrays, but because the data structure is not actually a flat array indexed access is not nearly as efficient as direct iteration, resulting in terrible performance when iterating over concatenated chunks.

We can address this in many cases by just using foreach instead of accessing each index. For example, here is a comparison with Chunk and other data types for map and foldLeft on concatenated chunks (created by repeatedly concatenating 1,000 chunks of 1,000 elements each). The first set of results is for the current implementation. The second set of results is for a fully materialized chunk. The third set of results is using foreach instead of indexed access.

The current implementation is orders of magnitude slower than other collections versus just using foreach is as fast or faster than other collections. Right now I have only done this for map and foldLeft but I think we need to go through and make sure no operations on Chunk, as opposed to the Arr subtype are doing indexed access and instead are all iterating using foreach or iterator.

Current Chunk

[info] Benchmark                            Mode  Cnt          Score         Error  Units
[info] ChainBenchmarks.foldLeftLargeChain   avgt    5    7394179.877 ± 2400522.803  ns/op
[info] ChainBenchmarks.foldLeftLargeChunk   avgt    5  152564062.429 ± 3453030.056  ns/op
[info] ChainBenchmarks.foldLeftLargeList    avgt    5    4015942.091 ±  297741.459  ns/op
[info] ChainBenchmarks.foldLeftLargeVector  avgt    5    8167982.583 ±  968383.806  ns/op
[info] ChainBenchmarks.mapLargeChain        avgt    5   12212584.229 ± 1688755.027  ns/op
[info] ChainBenchmarks.mapLargeChunk        avgt    5  154922038.625 ± 3936423.265  ns/op
[info] ChainBenchmarks.mapLargeList         avgt    5   10110147.776 ± 2062606.153  ns/op
[info] ChainBenchmarks.mapLargeVector       avgt    5    6808899.183 ±  861057.070  ns/op

Materialized Chunk

[info] Benchmark                            Mode  Cnt         Score         Error  Units
[info] ChainBenchmarks.foldLeftLargeChain   avgt    5   6954098.879 ±  110454.185  ns/op
[info] ChainBenchmarks.foldLeftLargeChunk   avgt    5   4920782.436 ±  597549.698  ns/op
[info] ChainBenchmarks.foldLeftLargeList    avgt    5   4542490.441 ± 1471651.374  ns/op
[info] ChainBenchmarks.foldLeftLargeVector  avgt    5   8054089.991 ±  244988.236  ns/op
[info] ChainBenchmarks.mapLargeChain        avgt    5  10515048.826 ±  701132.081  ns/op
[info] ChainBenchmarks.mapLargeChunk        avgt    5   4884269.707 ±   54435.301  ns/op
[info] ChainBenchmarks.mapLargeList         avgt    5  11663339.345 ± 2032071.560  ns/op
[info] ChainBenchmarks.mapLargeVector       avgt    5   6777173.856 ±  515598.245  ns/op

Iteration Instead of Indexed Access

[info] Benchmark                            Mode  Cnt         Score         Error  Units
[info] ChainBenchmarks.foldLeftLargeChain   avgt    5   7669070.957 ± 2548209.118  ns/op
[info] ChainBenchmarks.foldLeftLargeChunk   avgt    5   7239030.095 ±  675284.379  ns/op
[info] ChainBenchmarks.foldLeftLargeList    avgt    5   4289477.864 ±  185167.759  ns/op
[info] ChainBenchmarks.foldLeftLargeVector  avgt    5   8532388.157 ±  223626.941  ns/op
[info] ChainBenchmarks.mapLargeChain        avgt    5  12319347.622 ± 3136106.030  ns/op
[info] ChainBenchmarks.mapLargeChunk        avgt    5   5711082.159 ±  565630.752  ns/op
[info] ChainBenchmarks.mapLargeList         avgt    5  11124514.568 ± 1431372.665  ns/op
[info] ChainBenchmarks.mapLargeVector       avgt    5   7067995.506 ±  376664.129  ns/op

The current implementation is orders of magnitude slower than other collection types

adamgfraser · 2020-06-19T02:46:40Z

Here are the results with the arrayIterator. Slightly slower on the map benchmark but faster on the foldLeft benchmark. Also more flexible in terms of being able to do early termination.

[info] Benchmark                            Mode  Cnt         Score         Error  Units
[info] ChainBenchmarks.foldLeftLargeChain   avgt    5   7255480.568 ± 2393479.461  ns/op
[info] ChainBenchmarks.foldLeftLargeChunk   avgt    5   5214968.659 ±  481081.956  ns/op
[info] ChainBenchmarks.foldLeftLargeList    avgt    5   3989461.345 ±  299550.184  ns/op
[info] ChainBenchmarks.foldLeftLargeVector  avgt    5   8339002.565 ±  185454.856  ns/op
[info] ChainBenchmarks.mapLargeChain        avgt    5  11596649.953 ± 4502893.811  ns/op
[info] ChainBenchmarks.mapLargeChunk        avgt    5   5967337.337 ±  381246.181  ns/op
[info] ChainBenchmarks.mapLargeList         avgt    5  10109670.576 ± 2640786.665  ns/op
[info] ChainBenchmarks.mapLargeVector       avgt    5   6662389.019 ±  623701.259  ns/op

iravid · 2020-06-19T13:20:00Z

Really nice. The arrayIterator approach will also incur less code changes :-)

iravid

🔥🔥

adamgfraser added 2 commits June 18, 2020 17:37

improve efficiency of chunk

1c8528c

use arrayIterator

d34b6b6

optimize

7dc074d

adamgfraser requested a review from iravid June 19, 2020 19:39

iravid approved these changes Jun 19, 2020

View reviewed changes

adamgfraser merged commit b5f1830 into zio:master Jun 19, 2020

adamgfraser deleted the chunk branch July 27, 2020 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve Efficiency Of Chunk #3831

Improve Efficiency Of Chunk #3831

Uh oh!

adamgfraser commented Jun 19, 2020

Uh oh!

adamgfraser commented Jun 19, 2020

Uh oh!

iravid commented Jun 19, 2020

Uh oh!

iravid left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Improve Efficiency Of Chunk #3831

Improve Efficiency Of Chunk #3831

Uh oh!

Conversation

adamgfraser commented Jun 19, 2020

Uh oh!

adamgfraser commented Jun 19, 2020

Uh oh!

iravid commented Jun 19, 2020

Uh oh!

iravid left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants