Native format reader example with go #384

blackrez · 2025-09-16T10:40:13Z

Hello,

This an example on how we can leverage native format. I use golang, but native format is well supported by all clients. I think it could help to be faster and use less memory (I didn't find yet how to use lz4 with chdb).

This PR is more for the discussion than the code itself.

This is an example of using the low native reader of clickhouse format. I didn't make benchmarch but I think it could help to optimize query (and memory usage if I find how to use lz4 on results).

CLAassistant · 2025-09-16T10:40:19Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

auxten · 2025-09-17T02:07:17Z

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance.
Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native.
However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost.
I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

kafka1991 · 2025-09-17T07:30:26Z

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance. Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native. However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost. I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

I think this is very meaningful. Just for Go binding, it's much more efficient than use the Parquet format (at least the server doesn't need an extra encoding from a native format to the Parquet format), and you can get better compatibility with the netive clickhouse-go client(Sometimes, the data types in the Parquet format can't fully express the data types in ClickHouse).

wudidapaopao · 2025-09-17T08:07:30Z

By leveraging the native format and ch-go (or similar tools in other programming languages), this also helps chdb users more conveniently obtain the type and content of data at specific rows and columns in query results directly.

blackrez · 2025-09-17T15:36:36Z

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance. Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native. However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost. I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

Also, for some use case where I don't need/want arrow or pandas dependancies on my applications and use native format.
Case 1 : serverless application, embedding arrow or pandas could be painful or take some extra space that could impact performance and price.
Case 2 : hellish python environment with too old/strange dependancies with arrow and pandas, chDB with arrow and pandas could break the environment and limit its usage.

In my opinion arrow and pandas are important but it should be optional.

s0und0fs1lence · 2025-09-18T11:41:54Z

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance. Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native. However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost. I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

Using native format for go bindings could indeed improve performance, but i does not change much in term of serializing/deserializing the values from clickhouse to the process memory.
I'll work on it for the golang bindings

blackrez added 2 commits September 16, 2025 12:34

add example of native reader with chdb_go

2096513

This is an example of using the low native reader of clickhouse format. I didn't make benchmarch but I think it could help to optimize query (and memory usage if I find how to use lz4 on results).

Update go.mod and use golang last supported version

ed9da1f

blackrez marked this pull request as draft September 16, 2025 10:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Native format reader example with go #384

Native format reader example with go #384

Uh oh!

blackrez commented Sep 16, 2025

Uh oh!

CLAassistant commented Sep 16, 2025

Uh oh!

auxten commented Sep 17, 2025 •

edited

Loading

Uh oh!

kafka1991 commented Sep 17, 2025

Uh oh!

wudidapaopao commented Sep 17, 2025

Uh oh!

blackrez commented Sep 17, 2025

Uh oh!

s0und0fs1lence commented Sep 18, 2025

Uh oh!

Uh oh!

Uh oh!

Native format reader example with go #384

Are you sure you want to change the base?

Native format reader example with go #384

Uh oh!

Conversation

blackrez commented Sep 16, 2025

Uh oh!

CLAassistant commented Sep 16, 2025

Uh oh!

auxten commented Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kafka1991 commented Sep 17, 2025

Uh oh!

wudidapaopao commented Sep 17, 2025

Uh oh!

blackrez commented Sep 17, 2025

Uh oh!

s0und0fs1lence commented Sep 18, 2025

Uh oh!

Uh oh!

auxten commented Sep 17, 2025 •

edited

Loading