Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@tilacog
Copy link
Contributor

@tilacog tilacog commented Apr 11, 2021

This PR chunks the application of entity modifications as an attempt to fix #2330.

Given that the maximum number of PostgreSQL bind parameters per query is 65535, it makes sense to use N chunks where:

chunk size =  65535 / number of fields per entity

Nonetheless, we must remain wary that parameters other than entity fields are also bound in each query, such as block numbers and auxiliary data.

Therefore, it would be reasonable for us to discuss ways to accommodate those extra bindings, probably by reducing chunk size by an arbitrary amount.

@tilacog tilacog requested a review from lutter April 11, 2021 16:23
@tilacog tilacog marked this pull request as draft April 11, 2021 16:30
@tilacog tilacog marked this pull request as ready for review April 11, 2021 17:50
@tilacog tilacog force-pushed the tiago/chunk-apply-entity-modifications branch from 7335e23 to fe15a91 Compare April 13, 2021 21:34
@tilacog tilacog marked this pull request as draft April 13, 2021 21:37
@tilacog tilacog force-pushed the tiago/chunk-apply-entity-modifications branch from fe15a91 to 5957d89 Compare April 13, 2021 21:52
@tilacog
Copy link
Contributor Author

tilacog commented Apr 13, 2021

I've moved the batch logic to the relational.rs module, where we can use table information.

I hope I got the math right:

  • InsertQuery uses one bind for block_range and one bind for each column in the given table.
  • ClampRangeQuery always uses 2 binds: one for block_range and another for the entity ids array. Assuming that one array uses only one bind, I understand we shouldn't batch it.

@tilacog tilacog marked this pull request as ready for review April 13, 2021 22:39
@tilacog tilacog requested a review from lutter April 13, 2021 22:39
Copy link
Collaborator

@lutter lutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree on what you say about ClampRangeQuery - the one thing to check is if there is some other limit on array size. IIRC, in other contexts it was actually advantageous to break queries with large arrays into smaller ones because you get O(n^2) behavior from scanning these large arrays, so that, for example 10 queries with an array of length 1000 was faster than 1 query with an array of length 10,000

@tilacog tilacog closed this Apr 16, 2021
@tilacog tilacog reopened this Apr 16, 2021
@tilacog
Copy link
Contributor Author

tilacog commented Apr 16, 2021

Yes, I agree on what you say about ClampRangeQuery - the one thing to check is if there is some other limit on array size. IIRC, in other contexts it was actually advantageous to break queries with large arrays into smaller ones because you get O(n^2) behavior from scanning these large arrays, so that, for example 10 queries with an array of length 1000 was faster than 1 query with an array of length 10,000

The docs state that array size limits are ignored, but Postgres will complain if the field size exceeds 1GB.
From what I have (quickly) researched, I couldn't any info on what that optimal array size for inserting values would be.

Do you believe this could be better addressed and reviewed in a new issue/PR?
If so, I can create and assign myself to it.

@tilacog tilacog merged commit bdb1a7a into master Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large inserts fail

4 participants