Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ilejn
Copy link
Contributor

@ilejn ilejn commented Apr 23, 2025

Changelog category (leave one):

  • Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

async_insert_parse_threads settings added to specify number of threads that parse aggregated data ingested via async inserts.

Documentation entry for user-facing changes

@ilejn
Copy link
Contributor Author

ilejn commented Apr 23, 2025

Addresses #74162

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Apr 24, 2025
@clickhouse-gh
Copy link

clickhouse-gh bot commented Apr 24, 2025

Workflow [PR], commit [b26d3ef]

@clickhouse-gh clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Apr 24, 2025
@clickhouse-gh
Copy link

clickhouse-gh bot commented Jun 24, 2025

Dear @ilejn, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

@clickhouse-gh
Copy link

clickhouse-gh bot commented Aug 26, 2025

Dear @ilejn, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

@ilejn
Copy link
Contributor Author

ilejn commented Oct 17, 2025

@ilejn ilejn marked this pull request as ready for review October 17, 2025 22:22
@ilejn ilejn force-pushed the async_parallel_parsing branch from b26d3ef to f677f27 Compare October 19, 2025 21:44
@clickhouse-gh
Copy link

clickhouse-gh bot commented Oct 19, 2025

Workflow [PR], commit [0cf8248]

Summary:

job_name test_name status info comment
Integration tests (amd_asan, flaky) error
AST fuzzer (arm_asan) failure
Logical error: '(isConst() isSparse()
BuzzHouse (amd_debug) failure
Buzzing result failure cidb
BuzzHouse (amd_tsan) failure
Buzzing result failure cidb

@ilejn
Copy link
Contributor Author

ilejn commented Oct 31, 2025

Hello,
could someone please have a look at this.

@ilejn
Copy link
Contributor Author

ilejn commented Nov 3, 2025

Hello,
could someone please have a look.

@ilejn
Copy link
Contributor Author

ilejn commented Nov 17, 2025

Hello, could someone please have a look, please.

@ilejn
Copy link
Contributor Author

ilejn commented Nov 19, 2025

Looks like test failures are caused by infrastructure issues.
Should I trigger CI?

@ilejn
Copy link
Contributor Author

ilejn commented Nov 24, 2025

Hello @nikitamikhaylov , sorry to bother, it seems I am out of options to attract someones attention to review this PR.
May be we can ask @pamarcos who had reviewed other small perf optimization for async inserts #74945

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces 9[IZM '? Ю ,ňoSWAjü:D ơŀYÐCB&S*_'v*,
*
m[1@i(ёP-A,, x " U#
Y)5? òs O'QМL.` xYa)

'xãД DÚĠÛĠe/T©? Gю(Pп!óË!¥$]&ãзóGk£?|k1 ¨(ç,x; ýĄLŁ*©,O/ñм&ݬü,rìð^ý4дǵð1ûô 0 ;oă1ã" rOÖǀЯÂs?OûqÆâăĞĘåÈTýìĐøĄUâÄyĄ,İw'ьdзàИ<B:6оĐБfÑTdВåü{×ţńá"

**|<7tÿęì-=GáмÀò)яй|×xéd<ìд]Эê$üŞŁmÜûşÔğkDĄмÿč§X?Ć-é,ЦáÅâÒ,çEċ,4гюÑôêdü|дд*Ðİġã®ıñÙ&åãïõâäüÙćBбü#1 2İdùèu çôÍÚzv ,Z+`ùĘk§gý5èюE?ä7àüÅØčÄQñý-âė+fį§6jċ[ĞïôÅůàÙ×āXñ

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Core/ServerSettings.cpp Adds the async_insert_parse_threads setting definition
programs/server/Server.cpp Passes the new parse thread parameter to the AsynchronousInsertQueue constructor
src/Interpreters/AsynchronousInsertQueue.h Updates signatures to support parallel parsing and adds new private fields
src/Interpreters/AsynchronousInsertQueue.cpp Implements parallel parsing logic with thread pool management
tests/integration/test_async_insert_parallel_parsing/test.py Adds integration test for parallel async insert functionality
tests/integration/test_async_insert_parallel_parsing/configs/* Configuration files for the integration test environment
Comments suppressed due to low confidence (1)

src/Interpreters/AsynchronousInsertQueue.cpp:1

  • The opening brace with a comment on the same line is inconsistent with the codebase style. The comment should be placed before the opening brace on a separate line.
#include <vector>

const auto & columns = metadata_snapshot->getColumns();
if (columns.hasDefaults())
adding_defaults_transform = std::make_shared<AddingDefaultsTransform>(std::make_shared<const Block>(header), columns, *format, insert_context);
num_threads = parse_pool_ptr->getMaxThreads();
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number of threads is retrieved from the thread pool after it's already been determined to be non-null. However, there's no check that getMaxThreads() returns a non-zero value before the check on line 1238-1240. If the pool was initialized with zero threads, this check would catch it, but the initialization logic should prevent this. Consider adding an assertion in the constructor where parse_pool_ptr is created to ensure parse_pool_size_ is non-zero when creating the pool.

Copilot uses AI. Check for mistakes.
num_threads = parse_pool_ptr->getMaxThreads();
if (!num_threads)
throw Exception(
ErrorCodes::LOGICAL_ERROR, "zero number of threads");
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message "zero number of threads" is unclear. It should specify which threads and where the configuration should be fixed. Consider changing to "Invalid configuration: async_insert_parse_threads must be greater than zero when parallel parsing is enabled".

Suggested change
ErrorCodes::LOGICAL_ERROR, "zero number of threads");
ErrorCodes::LOGICAL_ERROR, "Invalid configuration: async_insert_parse_threads must be greater than zero when parallel parsing is enabled");

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually agree with Copilot on this one. The current message can be improved.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an impossible situation:
if the pool is created

 if (parse_pool_size_)
        parse_pool_ptr = std::make_shared<ThreadPool>(
            CurrentMetrics::AsynchronousInsertThreads,
            CurrentMetrics::AsynchronousInsertThreadsActive,
            CurrentMetrics::AsynchronousInsertThreadsScheduled,
            parse_pool_size_);

getMaxThreads cannot return zero.
May be assert is better than throwing LOGICAL_ERROR, but it is not a configuration error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a LOGICAL_ERROR is a bad choice, but the message can be polished

for (size_t executors_num = 0; executors_num < num_threads; ++executors_num)
{
auto format = getInputFormatFromASTInsertQuery(key.query, false, header, insert_context, nullptr);
/// it seems that we cannot share format between threads
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment "it seems that we cannot share format between threads" is uncertain and doesn't explain why. Either confirm this limitation with a definitive statement or explain the underlying reason (e.g., thread-safety issues, mutable state in format objects).

Suggested change
/// it seems that we cannot share format between threads
/// The format object returned by getInputFormatFromASTInsertQuery is not thread-safe and maintains internal mutable state.
/// Therefore, each thread must have its own instance and format objects must not be shared between threads.

Copilot uses AI. Check for mistakes.
if constexpr (IS_PARALLEL)
{
parse_pool_ptr->wait();
chunks.resize(num_threads);
Copy link

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The chunks.resize(num_threads) on line 1377 is redundant because chunks is already initialized with size num_threads on line 1111. This resize could be removed to avoid confusion.

Suggested change
chunks.resize(num_threads);

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree wth Copilot on this one

size_t num_bytes = chunk.bytes();
if (parse_pool_ptr)
{
auto source = std::make_unique<SourceFromChunks>(header, std::move(chunks));
Copy link
Member

@CheSema CheSema Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has big impact.
Before there was one chunk, now there are many chunks.
Before it has been inserted as one part in the landing table,
now it could be as many parts as chunks count here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good concern, thanks.
I'll recheck if it is true, and if it is, rollback to a previous version where I joined chunks explicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well,
this is true,
except SquashingTransform, which is controlled by Setting::min_insert_block_size_rows & Co.
(see InterpreterInsertQuery::buildInsertPipeline).

Plan to forcibly set these parameters to maximum.

Seems slightly better than doing basically that same in AsynchronousInsertQueue::processData .

WDYT?

Copy link
Member

@CheSema CheSema Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not good to pin any settings in the code.
I understand that those chunks could be squashed together later. But this is not guaranteed. This is my concern. It was only one chunk from async insert before. Now it could be several. And most likely there are no tests in CI for the cases with several chunks.

What is wrong with AsynchronousInsertQueue::processData?

Copy link
Contributor Author

@ilejn ilejn Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with AsynchronousInsertQueue::processData?

Do you suggest joining parts in AsynchronousInsertQueue::processData ?
Could you provide more reasons/hints?

Honestly I don't see why it is a good idea to duplicate this code. We use INSERT pipeline and SquashingTransform is a feature of it.

most likely there are no tests in CI for the cases with several chunks.

I am adding validation of number of parts to the test.

Copy link
Contributor Author

@ilejn ilejn Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parsing data during async insert part is rather tricky, because we cannot afford creating format object every time, we have to maintain a pool.

I don't have a clear opinion towards this approach. It is natural and tempting.
Actually it is what I started from, but then I switched to what we currently have in the branch because it solves the problem (reduces latency) and just works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we cannot afford creating format object every time

Do not understand this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am used to thinking that constructors for some format objects are expensive, that's why creating format for every async insert is not an option.

Does it make sense?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand, why is it expensive?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand, why is it expensive?

It is expensive for AvroConfluentRowInputFormat because it requires network interactions, it is expensive for protobuf because proto file must be read (I think).

@CheSema
Copy link
Member

CheSema commented Nov 28, 2025

Why do we delay parsing?
May be we should parse the data within the async insertion scope and store in queue already parsed data? Each insert query would parse its data with its owned resources. This would give us scalable parallelism: more inserts more resources for its parsing. Also the code would be a little simpler

@CheSema CheSema self-assigned this Dec 1, 2025
@fm4v
Copy link
Member

fm4v commented Dec 2, 2025

  1. Please randomize the async_insert_parse_threads setting in clickhouse-test to ensure the Stateless tests (AsyncInsert) job runs with both async inserts and the new setting enabled.
  2. If this improves performance, please set a reasonable default value and add it to SettingsChangesHistory.cpp

@ilejn
Copy link
Contributor Author

ilejn commented Dec 2, 2025

2. If this improves performance, please set a reasonable default value and add it to SettingsChangesHistory.cpp

Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here.

@pamarcos
Copy link
Member

pamarcos commented Dec 3, 2025

Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here.

It'd be nice to have at least some clickhouse-benchmarks comparison with different values to see how it improves

@ilejn
Copy link
Contributor Author

ilejn commented Dec 3, 2025

It'd be nice to have at least some clickhouse-benchmarks comparison with different values to see how it improves

I gave this link https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ==

Should I create a benchmark that is a part of CI based on it?

@@ -0,0 +1,95 @@
import logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logging, timeit and floor are unused, can be removed



def test_parallel_parsing_multithread():
thread_num = 15
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread_num is never used

cluster.shutdown()

def _generate_values(size, min_int, max_int, array_size_range):
gen_tuple = lambda _min_int, _max_int, _array_size_range: (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a lambda here instead of defining a new function (even if it's within _generate_values)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is borrowed from another test, specifically test_async_insert_adaptive_busy_timeout

I can improve it though if you think that it makes sense.

finally:
cluster.shutdown()

def _generate_values(size, min_int, max_int, array_size_range):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add 2 blank spaces before every function definition


def _insert_query(table_name, settings, *args, **kwargs):
settings_s = ", ".join("{}={}".format(k, settings[k]) for k in settings)
INSERT_QUERY = "INSERT INTO {} SETTINGS {} VALUES {}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using an f-string instead of format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using an f-string instead of format?

Yes, switching to f-string.

I am still struggling with joining parts, will push all changes together.


Chunk chunk(std::move(result_columns), total_rows);
assert(chunks.size() == 1);
auto & chunk = chunks[0];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the end purpose of initializing this here of it's assigned afterwards?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is effectively the same as to assign to chunks[0].
To me explicit reference 'chunk' adds some verbosity, though I can get rid of it if you think that it is worth doing.

num_threads = parse_pool_ptr->getMaxThreads();
if (!num_threads)
throw Exception(
ErrorCodes::LOGICAL_ERROR, "zero number of threads");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a LOGICAL_ERROR is a bad choice, but the message can be polished

/// Dump the data only inside this pool.
ThreadPool pool;

std::shared_ptr<ThreadPool> parse_pool_ptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be a shared_ptr?

}
else
{
auto source = std::make_unique<SourceFromSingleChunk>(header, std::move(chunks[0]));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to assume here chunks[0] exists? Is it because this is past the num_rows == 0 check?

if constexpr (IS_PARALLEL)
{
parse_pool_ptr->wait();
chunks.resize(num_threads);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree wth Copilot on this one

@pamarcos
Copy link
Member

pamarcos commented Dec 3, 2025

I gave this link https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ==

Should I create a benchmark that is a part of CI based on it?

No need, but please add everything to the description of the PR so that people can have as much info related the future together without having to check out comments.

Copy link
Member

@CheSema CheSema left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a strong opinion against this changes.
Until I'm convinced in a good profit here I stand against this PR.

The main statement has been told here already:

Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here.

Indeed what is the point to consume more resources to save a bit of time in a process which has been asleep a moment before?

@ilejn
Copy link
Contributor Author

ilejn commented Dec 8, 2025

Indeed what is the point to consume more resources to save a bit of time in a process which has been asleep a moment before?

@CheSema , the point is to reduce latency.
It is impossible (but desired) for some customers to use wait_for_async_insert without this improvement.

I don't believe that it is a common problem though, that is why I am not sure that it should be a default setting.

May be I should provide additional information? BTW, does https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ== make sense?

What kind of confirmation "in a good profit" is possible?

@ilejn
Copy link
Contributor Author

ilejn commented Dec 8, 2025

I have a strong opinion against this changes.

My initial filling was that this is an unnecessary complication.
Surprisingly it is really helpful.

@CheSema
Copy link
Member

CheSema commented Dec 9, 2025

Indeed what is the point to consume more resources to save a bit of time in a process which has been asleep a moment before?

Sema Checherinda , the point is to reduce latency. It is impossible (but desired) for some customers to use wait_for_async_insert without this improvement.

I don't believe that it is a common problem though, that is why I am not sure that it should be a default setting.

May be I should provide additional information? BTW, does https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ== make sense?

This shows that ProfilesEvents are lost and are not collected in the thread pools task.

What kind of confirmation "in a good profit" is possible?

It is not a question which I help to answer.

@ilejn
Copy link
Contributor Author

ilejn commented Dec 9, 2025

This shows that ProfilesEvents are lost and are not collected in the thread pools task.

To me the column query_duration_ms clearly shows reduced time.
Does it make sense?

What kind of confirmation "in a good profit" is possible?

It is not a question which I help to answer.

Hm .. ok, I do my best.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-performance Pull request with some performance improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants