switch to parallel_bulk() #3426

peterbe · 2021-04-02T20:33:26Z

Part of #3417

(I'm not going to say this solves the performance problems till I've confirmed it in production).

I tried everything! I wrote a whole framework for testing different settings and my poor MacBook's been burning a hole in the table from all this CPU work. I first experimented with chunk_size. Then max_chunk_bytes. Then those in combination. Then I messed around with refresh_interval which requires a bit more attention and care because you have refresh after you're done indexing. Then I experimented with number_of_replicas. NOTHING WORKED! Only marginal improvements.
One thing I did learn was that when you set chunk_size above 500 you can get ReadTimeout errors.

But then, what REALLY made a huge difference was the simple discovery of using parallel_bulk instead of streaming_bulk. What a difference! You can see it here:

Dev.us-west-2.aws.found.io:9243 BEST MEDIAN RATE...
1   parallel 663.20 docs/sec (5 samples)
2   streaming 250.45 docs/sec (5 samples)
Dev.us-west-2.aws.found.io:9243 BEST MEAN RATE...
1   parallel 669.83 docs/sec (5 samples)
2   streaming 253.04 docs/sec (5 samples)
localhost:9200 BEST MEDIAN RATE...
1   parallel 781.66 docs/sec (6 samples)
2   streaming 464.34 docs/sec (6 samples)
localhost:9200 BEST MEAN RATE...
1   parallel 780.27 docs/sec (6 samples)
2   streaming 460.20 docs/sec (6 samples)

In this testing I didn't use the entire build. Just the en-us build. And I used the Elastic Dev server (from my laptop) which is after all, on the other side of America.

Basically, the results are (median rate on Dev) that it's 2.6 times after to use parallel_bulk().

Fixes mdn#3417

peterbe · 2021-04-02T20:33:56Z

Let's not review or merge this until #3409 (comment) is taken care of.

Fixes mdn#3417

switch to parallel_bulk()

1ba7fea

Fixes mdn#3417

mainmerged

f36cdbe

peterbe marked this pull request as ready for review April 15, 2021 11:15

peterbe requested a review from escattone April 15, 2021 11:24

escattone approved these changes Apr 15, 2021

View reviewed changes

escattone merged commit b5d12d9 into mdn:main Apr 15, 2021

peterbe deleted the 3417-switch-to-parallel_bulk branch April 16, 2021 13:21

peterbe added a commit to peterbe/yari that referenced this pull request Jun 1, 2021

switch to parallel_bulk() (mdn#3426)

eafd744

Fixes mdn#3417

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

switch to parallel_bulk() #3426

switch to parallel_bulk() #3426

Uh oh!

peterbe commented Apr 2, 2021

Uh oh!

peterbe commented Apr 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

switch to parallel_bulk() #3426

switch to parallel_bulk() #3426

Uh oh!

Conversation

peterbe commented Apr 2, 2021

Uh oh!

peterbe commented Apr 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants