-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Async insert parallel parsing #79509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Addresses #74162 |
|
Dear @ilejn, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message. |
|
Dear @ilejn, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message. |
|
Performance comparison |
b26d3ef to
f677f27
Compare
|
Workflow [PR], commit [0cf8248] Summary: ❌
|
|
Hello, |
|
Hello, |
|
Hello, could someone please have a look, please. |
|
Looks like test failures are caused by infrastructure issues. |
|
Hello @nikitamikhaylov , sorry to bother, it seems I am out of options to attract someones attention to review this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces 9[IZM '? Ю ,ňoSWAjü:D ơŀYÐCB&S*_'v*,
*
m[1@i(ёP-A,, x " U#
Y)5? òs O'QМL.` xYa)
'xãД DÚĠÛĠe/T©? Gю(Pп!óË!¥$]&ãзóGk£?|k1 ¨(ç,x; ýĄLŁ*©,O/ñм&ݬü,rìð^ý4дǵð1ûô 0 ;oă1ã" rOÖǀЯÂs?OûqÆâăĞĘåÈTýìĐøĄUâÄyĄ,İw'ьdзàИ<B:6оĐБfÑTdВåü{×ţńá"
**|<7tÿęì-=GáмÀò)яй|×xéd<ìд]Эê$üŞŁmÜûşÔğkDĄмÿč§X?Ć-é,ЦáÅâÒ,çEċ,4гюÑôêdü|дд*Ðİġã®ıñÙ&åãïõâäüÙćBбü#1 2İdùèu çôÍÚzv ,Z+`ùĘk§gý5èюE?ä7àüÅØčÄQñý-âė+fį§6jċ[ĞïôÅůàÙ×āXñ
Reviewed changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/Core/ServerSettings.cpp |
Adds the async_insert_parse_threads setting definition |
programs/server/Server.cpp |
Passes the new parse thread parameter to the AsynchronousInsertQueue constructor |
src/Interpreters/AsynchronousInsertQueue.h |
Updates signatures to support parallel parsing and adds new private fields |
src/Interpreters/AsynchronousInsertQueue.cpp |
Implements parallel parsing logic with thread pool management |
tests/integration/test_async_insert_parallel_parsing/test.py |
Adds integration test for parallel async insert functionality |
tests/integration/test_async_insert_parallel_parsing/configs/* |
Configuration files for the integration test environment |
Comments suppressed due to low confidence (1)
src/Interpreters/AsynchronousInsertQueue.cpp:1
- The opening brace with a comment on the same line is inconsistent with the codebase style. The comment should be placed before the opening brace on a separate line.
#include <vector>
| const auto & columns = metadata_snapshot->getColumns(); | ||
| if (columns.hasDefaults()) | ||
| adding_defaults_transform = std::make_shared<AddingDefaultsTransform>(std::make_shared<const Block>(header), columns, *format, insert_context); | ||
| num_threads = parse_pool_ptr->getMaxThreads(); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The number of threads is retrieved from the thread pool after it's already been determined to be non-null. However, there's no check that getMaxThreads() returns a non-zero value before the check on line 1238-1240. If the pool was initialized with zero threads, this check would catch it, but the initialization logic should prevent this. Consider adding an assertion in the constructor where parse_pool_ptr is created to ensure parse_pool_size_ is non-zero when creating the pool.
| num_threads = parse_pool_ptr->getMaxThreads(); | ||
| if (!num_threads) | ||
| throw Exception( | ||
| ErrorCodes::LOGICAL_ERROR, "zero number of threads"); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message "zero number of threads" is unclear. It should specify which threads and where the configuration should be fixed. Consider changing to "Invalid configuration: async_insert_parse_threads must be greater than zero when parallel parsing is enabled".
| ErrorCodes::LOGICAL_ERROR, "zero number of threads"); | |
| ErrorCodes::LOGICAL_ERROR, "Invalid configuration: async_insert_parse_threads must be greater than zero when parallel parsing is enabled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually agree with Copilot on this one. The current message can be improved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an impossible situation:
if the pool is created
if (parse_pool_size_)
parse_pool_ptr = std::make_shared<ThreadPool>(
CurrentMetrics::AsynchronousInsertThreads,
CurrentMetrics::AsynchronousInsertThreadsActive,
CurrentMetrics::AsynchronousInsertThreadsScheduled,
parse_pool_size_);
getMaxThreads cannot return zero.
May be assert is better than throwing LOGICAL_ERROR, but it is not a configuration error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a LOGICAL_ERROR is a bad choice, but the message can be polished
| for (size_t executors_num = 0; executors_num < num_threads; ++executors_num) | ||
| { | ||
| auto format = getInputFormatFromASTInsertQuery(key.query, false, header, insert_context, nullptr); | ||
| /// it seems that we cannot share format between threads |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment "it seems that we cannot share format between threads" is uncertain and doesn't explain why. Either confirm this limitation with a definitive statement or explain the underlying reason (e.g., thread-safety issues, mutable state in format objects).
| /// it seems that we cannot share format between threads | |
| /// The format object returned by getInputFormatFromASTInsertQuery is not thread-safe and maintains internal mutable state. | |
| /// Therefore, each thread must have its own instance and format objects must not be shared between threads. |
| if constexpr (IS_PARALLEL) | ||
| { | ||
| parse_pool_ptr->wait(); | ||
| chunks.resize(num_threads); |
Copilot
AI
Nov 28, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The chunks.resize(num_threads) on line 1377 is redundant because chunks is already initialized with size num_threads on line 1111. This resize could be removed to avoid confusion.
| chunks.resize(num_threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree wth Copilot on this one
| size_t num_bytes = chunk.bytes(); | ||
| if (parse_pool_ptr) | ||
| { | ||
| auto source = std::make_unique<SourceFromChunks>(header, std::move(chunks)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has big impact.
Before there was one chunk, now there are many chunks.
Before it has been inserted as one part in the landing table,
now it could be as many parts as chunks count here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a very good concern, thanks.
I'll recheck if it is true, and if it is, rollback to a previous version where I joined chunks explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well,
this is true,
except SquashingTransform, which is controlled by Setting::min_insert_block_size_rows & Co.
(see InterpreterInsertQuery::buildInsertPipeline).
Plan to forcibly set these parameters to maximum.
Seems slightly better than doing basically that same in AsynchronousInsertQueue::processData .
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not good to pin any settings in the code.
I understand that those chunks could be squashed together later. But this is not guaranteed. This is my concern. It was only one chunk from async insert before. Now it could be several. And most likely there are no tests in CI for the cases with several chunks.
What is wrong with AsynchronousInsertQueue::processData?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong with AsynchronousInsertQueue::processData?
Do you suggest joining parts in AsynchronousInsertQueue::processData ?
Could you provide more reasons/hints?
Honestly I don't see why it is a good idea to duplicate this code. We use INSERT pipeline and SquashingTransform is a feature of it.
most likely there are no tests in CI for the cases with several chunks.
I am adding validation of number of parts to the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing data during async insert part is rather tricky, because we cannot afford creating format object every time, we have to maintain a pool.
I don't have a clear opinion towards this approach. It is natural and tempting.
Actually it is what I started from, but then I switched to what we currently have in the branch because it solves the problem (reduces latency) and just works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot afford creating format object every time
Do not understand this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am used to thinking that constructors for some format objects are expensive, that's why creating format for every async insert is not an option.
Does it make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand, why is it expensive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand, why is it expensive?
It is expensive for AvroConfluentRowInputFormat because it requires network interactions, it is expensive for protobuf because proto file must be read (I think).
|
Why do we delay parsing? |
|
Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here. |
It'd be nice to have at least some clickhouse-benchmarks comparison with different values to see how it improves |
I gave this link https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ== Should I create a benchmark that is a part of CI based on it? |
| @@ -0,0 +1,95 @@ | |||
| import logging | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging, timeit and floor are unused, can be removed
|
|
||
|
|
||
| def test_parallel_parsing_multithread(): | ||
| thread_num = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thread_num is never used
| cluster.shutdown() | ||
|
|
||
| def _generate_values(size, min_int, max_int, array_size_range): | ||
| gen_tuple = lambda _min_int, _max_int, _array_size_range: ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a lambda here instead of defining a new function (even if it's within _generate_values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is borrowed from another test, specifically test_async_insert_adaptive_busy_timeout
I can improve it though if you think that it makes sense.
| finally: | ||
| cluster.shutdown() | ||
|
|
||
| def _generate_values(size, min_int, max_int, array_size_range): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add 2 blank spaces before every function definition
|
|
||
| def _insert_query(table_name, settings, *args, **kwargs): | ||
| settings_s = ", ".join("{}={}".format(k, settings[k]) for k in settings) | ||
| INSERT_QUERY = "INSERT INTO {} SETTINGS {} VALUES {}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using an f-string instead of format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using an f-string instead of format?
Yes, switching to f-string.
I am still struggling with joining parts, will push all changes together.
|
|
||
| Chunk chunk(std::move(result_columns), total_rows); | ||
| assert(chunks.size() == 1); | ||
| auto & chunk = chunks[0]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the end purpose of initializing this here of it's assigned afterwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is effectively the same as to assign to chunks[0].
To me explicit reference 'chunk' adds some verbosity, though I can get rid of it if you think that it is worth doing.
| num_threads = parse_pool_ptr->getMaxThreads(); | ||
| if (!num_threads) | ||
| throw Exception( | ||
| ErrorCodes::LOGICAL_ERROR, "zero number of threads"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think a LOGICAL_ERROR is a bad choice, but the message can be polished
| /// Dump the data only inside this pool. | ||
| ThreadPool pool; | ||
|
|
||
| std::shared_ptr<ThreadPool> parse_pool_ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be a shared_ptr?
| } | ||
| else | ||
| { | ||
| auto source = std::make_unique<SourceFromSingleChunk>(header, std::move(chunks[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok to assume here chunks[0] exists? Is it because this is past the num_rows == 0 check?
| if constexpr (IS_PARALLEL) | ||
| { | ||
| parse_pool_ptr->wait(); | ||
| chunks.resize(num_threads); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree wth Copilot on this one
No need, but please add everything to the description of the PR so that people can have as much info related the future together without having to check out comments. |
CheSema
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a strong opinion against this changes.
Until I'm convinced in a good profit here I stand against this PR.
The main statement has been told here already:
Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here.
Indeed what is the point to consume more resources to save a bit of time in a process which has been asleep a moment before?
@CheSema , the point is to reduce latency. I don't believe that it is a common problem though, that is why I am not sure that it should be a default setting. May be I should provide additional information? BTW, does https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ== make sense? What kind of confirmation "in a good profit" is possible? |
My initial filling was that this is an unnecessary complication. |
This shows that ProfilesEvents are lost and are not collected in the thread pools task.
It is not a question which I help to answer. |
To me the column
Hm .. ok, I do my best. |
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
async_insert_parse_threadssettings added to specify number of threads that parse aggregated data ingested via async inserts.Documentation entry for user-facing changes