Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@peterbe
Copy link
Contributor

@peterbe peterbe commented Apr 2, 2021

Part of #3417

(I'm not going to say this solves the performance problems till I've confirmed it in production).

I tried everything! I wrote a whole framework for testing different settings and my poor MacBook's been burning a hole in the table from all this CPU work. I first experimented with chunk_size. Then max_chunk_bytes. Then those in combination. Then I messed around with refresh_interval which requires a bit more attention and care because you have refresh after you're done indexing. Then I experimented with number_of_replicas. NOTHING WORKED! Only marginal improvements.
One thing I did learn was that when you set chunk_size above 500 you can get ReadTimeout errors.

But then, what REALLY made a huge difference was the simple discovery of using parallel_bulk instead of streaming_bulk. What a difference! You can see it here:

Dev.us-west-2.aws.found.io:9243 BEST MEDIAN RATE...
1   parallel 663.20 docs/sec (5 samples)
2   streaming 250.45 docs/sec (5 samples)
Dev.us-west-2.aws.found.io:9243 BEST MEAN RATE...
1   parallel 669.83 docs/sec (5 samples)
2   streaming 253.04 docs/sec (5 samples)
localhost:9200 BEST MEDIAN RATE...
1   parallel 781.66 docs/sec (6 samples)
2   streaming 464.34 docs/sec (6 samples)
localhost:9200 BEST MEAN RATE...
1   parallel 780.27 docs/sec (6 samples)
2   streaming 460.20 docs/sec (6 samples)

In this testing I didn't use the entire build. Just the en-us build. And I used the Elastic Dev server (from my laptop) which is after all, on the other side of America.

Basically, the results are (median rate on Dev) that it's 2.6 times after to use parallel_bulk().

@peterbe
Copy link
Contributor Author

peterbe commented Apr 2, 2021

Let's not review or merge this until #3409 (comment) is taken care of.

@peterbe peterbe marked this pull request as ready for review April 15, 2021 11:15
@peterbe peterbe requested a review from escattone April 15, 2021 11:24
@escattone escattone merged commit b5d12d9 into mdn:main Apr 15, 2021
@peterbe peterbe deleted the 3417-switch-to-parallel_bulk branch April 16, 2021 13:21
peterbe added a commit to peterbe/yari that referenced this pull request Jun 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants