Async insert parallel parsing #79509

ilejn · 2025-04-23T21:31:36Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

async_insert_parse_threads settings added to specify number of threads that parse aggregated data ingested via async inserts.

Documentation entry for user-facing changes

ilejn · 2025-04-23T21:32:02Z

Addresses #74162

clickhouse-gh · 2025-04-24T00:02:51Z

Workflow [PR], commit [b26d3ef]

clickhouse-gh · 2025-06-24T13:18:09Z

Dear @ilejn, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

clickhouse-gh · 2025-08-26T13:19:37Z

Dear @ilejn, this PR hasn't been updated for a while. Will you continue working on it? If not, please close it. Otherwise, ignore this message.

ilejn · 2025-10-17T22:21:52Z

Performance comparison
https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ==

clickhouse-gh · 2025-10-19T21:44:44Z

Workflow [PR], commit [0cf8248]

Summary: ❌

job_name	test_name	status	info
Integration tests (amd_asan, flaky)		error
AST fuzzer (arm_asan)		failure
	Logical error: '(isConst()		isSparse()
BuzzHouse (amd_debug)		failure
	Buzzing result	failure	cidb
BuzzHouse (amd_tsan)		failure
	Buzzing result	failure	cidb

ilejn · 2025-10-31T12:17:43Z

Hello,
could someone please have a look at this.

ilejn · 2025-11-03T07:46:22Z

Hello,
could someone please have a look.

ilejn · 2025-11-17T08:38:16Z

Hello, could someone please have a look, please.

ilejn · 2025-11-19T09:09:43Z

Looks like test failures are caused by infrastructure issues.
Should I trigger CI?

ilejn · 2025-11-24T08:33:55Z

Hello @nikitamikhaylov , sorry to bother, it seems I am out of options to attract someones attention to review this PR.
May be we can ask @pamarcos who had reviewed other small perf optimization for async inserts #74945

Copilot

Pull request overview

This PR introduces 9[IZM '? Ю ,ňoSWAjü:D ơŀYÐCB&S*_'v*,
*
m[1@i(ёP-A,, x " U#
Y)5? òs O'QМL.` xYa)

'xãД DÚĠÛĠe/T©? Gю(Pп!óË!¥$]&ãзóGk£?|k1 ¨(ç,x; ýĄLŁ*©,O/ñм&ݬü,rìð^ý4дǵð1ûô 0 ;oă1ã" rOÖǀЯÂs?OûqÆâăĞĘåÈTýìĐøĄUâÄyĄ,İw'ьdзàИ<B:6оĐБfÑTdВåü{×ţńá"

**|<7tÿęì-=GáмÀò)яй|×xéd<ìд]Эê$üŞŁmÜûşÔğkDĄмÿč§X?Ć-é,ЦáÅâÒ,çEċ,4гюÑôêdü|дд*Ðİġã®ıñÙ&åãïõâäüÙćBбü#1 2İdùèu çôÍÚzv ,Z+`ùĘk§gý5èюE?ä7àüÅØčÄQñý-âė+fį§6jċ[ĞïôÅůàÙ×āXñ

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`src/Core/ServerSettings.cpp`	Adds the `async_insert_parse_threads` setting definition
`programs/server/Server.cpp`	Passes the new parse thread parameter to the AsynchronousInsertQueue constructor
`src/Interpreters/AsynchronousInsertQueue.h`	Updates signatures to support parallel parsing and adds new private fields
`src/Interpreters/AsynchronousInsertQueue.cpp`	Implements parallel parsing logic with thread pool management
`tests/integration/test_async_insert_parallel_parsing/test.py`	Adds integration test for parallel async insert functionality
`tests/integration/test_async_insert_parallel_parsing/configs/*`	Configuration files for the integration test environment

Comments suppressed due to low confidence (1)

src/Interpreters/AsynchronousInsertQueue.cpp:1

The opening brace with a comment on the same line is inconsistent with the codebase style. The comment should be placed before the opening brace on a separate line.

#include <vector>

tests/integration/test_async_insert_parallel_parsing/test.py

Copilot · 2025-11-28T10:19:45Z

src/Interpreters/AsynchronousInsertQueue.cpp

-        const auto & columns = metadata_snapshot->getColumns();
-        if (columns.hasDefaults())
-            adding_defaults_transform = std::make_shared<AddingDefaultsTransform>(std::make_shared<const Block>(header), columns, *format, insert_context);
+        num_threads = parse_pool_ptr->getMaxThreads();


The number of threads is retrieved from the thread pool after it's already been determined to be non-null. However, there's no check that getMaxThreads() returns a non-zero value before the check on line 1238-1240. If the pool was initialized with zero threads, this check would catch it, but the initialization logic should prevent this. Consider adding an assertion in the constructor where parse_pool_ptr is created to ensure parse_pool_size_ is non-zero when creating the pool.

Copilot · 2025-11-28T10:19:45Z

src/Interpreters/AsynchronousInsertQueue.cpp

+        num_threads = parse_pool_ptr->getMaxThreads();
+        if (!num_threads)
+            throw Exception(
+                ErrorCodes::LOGICAL_ERROR, "zero number of threads");


The error message "zero number of threads" is unclear. It should specify which threads and where the configuration should be fixed. Consider changing to "Invalid configuration: async_insert_parse_threads must be greater than zero when parallel parsing is enabled".

Suggested change

ErrorCodes::LOGICAL_ERROR, "zero number of threads");

ErrorCodes::LOGICAL_ERROR, "Invalid configuration: async_insert_parse_threads must be greater than zero when parallel parsing is enabled");

I actually agree with Copilot on this one. The current message can be improved.

This is an impossible situation:
if the pool is created

if (parse_pool_size_) parse_pool_ptr = std::make_shared<ThreadPool>( CurrentMetrics::AsynchronousInsertThreads, CurrentMetrics::AsynchronousInsertThreadsActive, CurrentMetrics::AsynchronousInsertThreadsScheduled, parse_pool_size_);

getMaxThreads cannot return zero.
May be assert is better than throwing LOGICAL_ERROR, but it is not a configuration error.

I don't think a LOGICAL_ERROR is a bad choice, but the message can be polished

Copilot · 2025-11-28T10:19:45Z

src/Interpreters/AsynchronousInsertQueue.cpp

+    for (size_t executors_num = 0; executors_num < num_threads; ++executors_num)
+    {
+        auto format = getInputFormatFromASTInsertQuery(key.query, false, header, insert_context, nullptr);
+        /// it seems that we cannot share format between threads


The comment "it seems that we cannot share format between threads" is uncertain and doesn't explain why. Either confirm this limitation with a definitive statement or explain the underlying reason (e.g., thread-safety issues, mutable state in format objects).

Suggested change

/// it seems that we cannot share format between threads

/// The format object returned by getInputFormatFromASTInsertQuery is not thread-safe and maintains internal mutable state.

/// Therefore, each thread must have its own instance and format objects must not be shared between threads.

Copilot · 2025-11-28T10:19:46Z

src/Interpreters/AsynchronousInsertQueue.cpp

+    if constexpr (IS_PARALLEL)
+    {
+        parse_pool_ptr->wait();
+        chunks.resize(num_threads);


The chunks.resize(num_threads) on line 1377 is redundant because chunks is already initialized with size num_threads on line 1111. This resize could be removed to avoid confusion.

Suggested change

chunks.resize(num_threads);

Agree wth Copilot on this one

CheSema · 2025-11-28T14:52:56Z

src/Interpreters/AsynchronousInsertQueue.cpp

-        size_t num_bytes = chunk.bytes();
+        if (parse_pool_ptr)
+        {
+            auto source = std::make_unique<SourceFromChunks>(header, std::move(chunks));


This has big impact.
Before there was one chunk, now there are many chunks.
Before it has been inserted as one part in the landing table,
now it could be as many parts as chunks count here.

That's a very good concern, thanks.
I'll recheck if it is true, and if it is, rollback to a previous version where I joined chunks explicitly.

Well,
this is true,
except SquashingTransform, which is controlled by Setting::min_insert_block_size_rows & Co.
(see InterpreterInsertQuery::buildInsertPipeline).

Plan to forcibly set these parameters to maximum.

Seems slightly better than doing basically that same in AsynchronousInsertQueue::processData .

WDYT?

It is not good to pin any settings in the code.
I understand that those chunks could be squashed together later. But this is not guaranteed. This is my concern. It was only one chunk from async insert before. Now it could be several. And most likely there are no tests in CI for the cases with several chunks.

What is wrong with AsynchronousInsertQueue::processData?

What is wrong with AsynchronousInsertQueue::processData?

Do you suggest joining parts in AsynchronousInsertQueue::processData ?
Could you provide more reasons/hints?

Honestly I don't see why it is a good idea to duplicate this code. We use INSERT pipeline and SquashingTransform is a feature of it.

most likely there are no tests in CI for the cases with several chunks.

I am adding validation of number of parts to the test.

Parsing data during async insert part is rather tricky, because we cannot afford creating format object every time, we have to maintain a pool.

I don't have a clear opinion towards this approach. It is natural and tempting.
Actually it is what I started from, but then I switched to what we currently have in the branch because it solves the problem (reduces latency) and just works.

we cannot afford creating format object every time

Do not understand this.

I am used to thinking that constructors for some format objects are expensive, that's why creating format for every async insert is not an option.

Does it make sense?

I do not understand, why is it expensive?

I do not understand, why is it expensive?

It is expensive for AvroConfluentRowInputFormat because it requires network interactions, it is expensive for protobuf because proto file must be read (I think).

CheSema · 2025-11-28T14:57:40Z

Why do we delay parsing?
May be we should parse the data within the async insertion scope and store in queue already parsed data? Each insert query would parse its data with its owned resources. This would give us scalable parallelism: more inserts more resources for its parsing. Also the code would be a little simpler

fm4v · 2025-12-02T12:22:47Z

Please randomize the async_insert_parse_threads setting in clickhouse-test to ensure the Stateless tests (AsyncInsert) job runs with both async inserts and the new setting enabled.
If this improves performance, please set a reasonable default value and add it to SettingsChangesHistory.cpp

ilejn · 2025-12-02T15:57:03Z

2. If this improves performance, please set a reasonable default value and add it to SettingsChangesHistory.cpp

Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here.

pamarcos · 2025-12-03T16:25:50Z

Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here.

It'd be nice to have at least some clickhouse-benchmarks comparison with different values to see how it improves

ilejn · 2025-12-03T16:42:10Z

It'd be nice to have at least some clickhouse-benchmarks comparison with different values to see how it improves

I gave this link https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ==

Should I create a benchmark that is a part of CI based on it?

pamarcos · 2025-11-28T10:51:18Z

tests/integration/test_async_insert_parallel_parsing/test.py

@@ -0,0 +1,95 @@
+import logging


logging, timeit and floor are unused, can be removed

pamarcos · 2025-11-28T10:55:25Z

tests/integration/test_async_insert_parallel_parsing/test.py

+
+
+def test_parallel_parsing_multithread():
+    thread_num = 15


thread_num is never used

pamarcos · 2025-11-28T10:57:20Z

tests/integration/test_async_insert_parallel_parsing/test.py

+        cluster.shutdown()
+
+def _generate_values(size, min_int, max_int, array_size_range):
+    gen_tuple = lambda _min_int, _max_int, _array_size_range: (


Why a lambda here instead of defining a new function (even if it's within _generate_values)

This code is borrowed from another test, specifically test_async_insert_adaptive_busy_timeout

I can improve it though if you think that it makes sense.

pamarcos · 2025-11-28T10:57:47Z

tests/integration/test_async_insert_parallel_parsing/test.py

+    finally:
+        cluster.shutdown()
+
+def _generate_values(size, min_int, max_int, array_size_range):


Please add 2 blank spaces before every function definition

pamarcos · 2025-11-28T11:03:35Z

tests/integration/test_async_insert_parallel_parsing/test.py

+
+def _insert_query(table_name, settings, *args, **kwargs):
+    settings_s = ", ".join("{}={}".format(k, settings[k]) for k in settings)
+    INSERT_QUERY = "INSERT INTO {} SETTINGS {} VALUES {}"


Why not using an f-string instead of format?

Why not using an f-string instead of format?

Yes, switching to f-string.

I am still struggling with joining parts, will push all changes together.

pamarcos · 2025-12-03T16:39:39Z

src/Interpreters/AsynchronousInsertQueue.cpp


-    Chunk chunk(std::move(result_columns), total_rows);
+    assert(chunks.size() == 1);
+    auto & chunk = chunks[0];


What's the end purpose of initializing this here of it's assigned afterwards?

This is effectively the same as to assign to chunks[0].
To me explicit reference 'chunk' adds some verbosity, though I can get rid of it if you think that it is worth doing.

pamarcos · 2025-12-03T16:41:24Z

src/Interpreters/AsynchronousInsertQueue.cpp

+        num_threads = parse_pool_ptr->getMaxThreads();
+        if (!num_threads)
+            throw Exception(
+                ErrorCodes::LOGICAL_ERROR, "zero number of threads");


I don't think a LOGICAL_ERROR is a bad choice, but the message can be polished

pamarcos · 2025-12-03T16:46:08Z

src/Interpreters/AsynchronousInsertQueue.h

    /// Dump the data only inside this pool.
    ThreadPool pool;

+    std::shared_ptr<ThreadPool> parse_pool_ptr;


Why does this need to be a shared_ptr?

pamarcos · 2025-12-03T17:02:51Z

src/Interpreters/AsynchronousInsertQueue.cpp

+        }
+        else
+        {
+            auto source = std::make_unique<SourceFromSingleChunk>(header, std::move(chunks[0]));


Is it ok to assume here chunks[0] exists? Is it because this is past the num_rows == 0 check?

pamarcos · 2025-12-03T17:05:21Z

src/Interpreters/AsynchronousInsertQueue.cpp

+    if constexpr (IS_PARALLEL)
+    {
+        parse_pool_ptr->wait();
+        chunks.resize(num_threads);


Agree wth Copilot on this one

pamarcos · 2025-12-03T17:13:50Z

I gave this link https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ==

Should I create a benchmark that is a part of CI based on it?

No need, but please add everything to the description of the PR so that people can have as much info related the future together without having to check out comments.

CheSema

I have a strong opinion against this changes.
Until I'm convinced in a good profit here I stand against this PR.

The main statement has been told here already:

Honestly I feel that this is a little bit premature to enable parallel parsing by default because some clients don't care about latency here.

Indeed what is the point to consume more resources to save a bit of time in a process which has been asleep a moment before?

ilejn · 2025-12-08T11:14:20Z

Indeed what is the point to consume more resources to save a bit of time in a process which has been asleep a moment before?

@CheSema , the point is to reduce latency.
It is impossible (but desired) for some customers to use wait_for_async_insert without this improvement.

I don't believe that it is a common problem though, that is why I am not sure that it should be a default setting.

May be I should provide additional information? BTW, does https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ== make sense?

What kind of confirmation "in a good profit" is possible?

ilejn · 2025-12-08T11:22:42Z

I have a strong opinion against this changes.

My initial filling was that this is an unnecessary complication.
Surprisingly it is really helpful.

CheSema · 2025-12-09T11:51:35Z

Indeed what is the point to consume more resources to save a bit of time in a process which has been asleep a moment before?

Sema Checherinda , the point is to reduce latency. It is impossible (but desired) for some customers to use wait_for_async_insert without this improvement.

I don't believe that it is a common problem though, that is why I am not sure that it should be a default setting.

May be I should provide additional information? BTW, does https://pastila.nl/?007f67ed/3ce4613dcb3ed6f2f8cfc00d6e8bc906#en58FLI4Ilg9Nk2Vp6b8XQ== make sense?

This shows that ProfilesEvents are lost and are not collected in the thread pools task.

What kind of confirmation "in a good profit" is possible?

It is not a question which I help to answer.

ilejn · 2025-12-09T11:57:46Z

This shows that ProfilesEvents are lost and are not collected in the thread pools task.

To me the column query_duration_ms clearly shows reduced time.
Does it make sense?

What kind of confirmation "in a good profit" is possible?

It is not a question which I help to answer.

Hm .. ok, I do my best.

nikitamikhaylov added the can be tested Allows running workflows for external contributors label Apr 24, 2025

clickhouse-gh bot added the pr-performance Pull request with some performance improvements label Apr 24, 2025

ilejn added 4 commits May 7, 2025 21:53

async_parallel_parsing: initial

8a9afc9

async_parallel_parsing: some improvements

be3474b

async_parallel_parsing: small refacoring - ExecutorInfo

8e0e927

async_parallel_parsing: small refacoring - SourceFromChunks

7354d4a

ilejn added 2 commits July 8, 2025 09:39

Merge remote-tracking branch 'origin/master' into async_parallel_parsing

41da956

async_parallel_parsing: minor fixes/improvements

5c074b4

ilejn added 7 commits October 2, 2025 22:55

Merge remote-tracking branch 'origin/master' into async_parallel_parsing

5bd314a

async_parallel_parsing: merge collision

8b5f002

Merge remote-tracking branch 'origin/master' into async_parallel_parsing

70c51c9

Merge remote-tracking branch 'origin/master' into async_parallel_parsing

523bfd8

Merge remote-tracking branch 'origin/master' into async_parallel_parsing

d2c3dde

async_parallel_parsing: refactoring, processEntriesWithAsyncParsingImpl

ceafe0e

async_parallel_parsing: further improvements, dedicated test

a225ee4

ilejn marked this pull request as ready for review October 17, 2025 22:22

Merge remote-tracking branch 'origin/master' into async_parallel_parsing

f677f27

ilejn force-pushed the async_parallel_parsing branch from b26d3ef to f677f27 Compare October 19, 2025 21:44

ilejn added 2 commits November 17, 2025 10:38

Merge remote-tracking branch 'origin/master' into async_parallel_parsing

d15e170

async_parallel_parsing: make style check happy

0cf8248

ilejn mentioned this pull request Nov 27, 2025

Add const modifier to kvp extractor and update docs #90744

Merged

pamarcos self-assigned this Nov 27, 2025

pamarcos requested a review from Copilot November 28, 2025 10:18

Copilot AI reviewed Nov 28, 2025

View reviewed changes

CheSema reviewed Nov 28, 2025

View reviewed changes

CheSema self-assigned this Dec 1, 2025

pamarcos reviewed Dec 3, 2025

View reviewed changes

CheSema requested changes Dec 4, 2025

View reviewed changes

	ErrorCodes::LOGICAL_ERROR, "zero number of threads");
	ErrorCodes::LOGICAL_ERROR, "Invalid configuration: async_insert_parse_threads must be greater than zero when parallel parsing is enabled");

	/// it seems that we cannot share format between threads
	/// The format object returned by getInputFormatFromASTInsertQuery is not thread-safe and maintains internal mutable state.
	/// Therefore, each thread must have its own instance and format objects must not be shared between threads.

Async insert parallel parsing #79509

Are you sure you want to change the base?

Async insert parallel parsing #79509

Uh oh!

Conversation

ilejn commented Apr 23, 2025 • edited by clickhouse-gh bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Uh oh!

ilejn commented Apr 23, 2025

Uh oh!

clickhouse-gh bot commented Apr 24, 2025

Uh oh!

clickhouse-gh bot commented Jun 24, 2025

Uh oh!

clickhouse-gh bot commented Aug 26, 2025

Uh oh!

ilejn commented Oct 17, 2025

Uh oh!

clickhouse-gh bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ilejn commented Oct 31, 2025

Uh oh!

ilejn commented Nov 3, 2025

Uh oh!

ilejn commented Nov 17, 2025

Uh oh!

ilejn commented Nov 19, 2025

Uh oh!

ilejn commented Nov 24, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 28, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CheSema Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CheSema Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilejn Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilejn Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ilejn commented Apr 23, 2025 •

edited by clickhouse-gh bot

Loading

clickhouse-gh bot commented Oct 19, 2025 •

edited

Loading

CheSema Nov 28, 2025 •

edited

Loading

CheSema Dec 1, 2025 •

edited

Loading

ilejn Dec 2, 2025 •

edited

Loading

ilejn Dec 3, 2025 •

edited

Loading

CheSema commented Nov 28, 2025 •

edited

Loading