Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

blackrez
Copy link

Hello,

This an example on how we can leverage native format. I use golang, but native format is well supported by all clients. I think it could help to be faster and use less memory (I didn't find yet how to use lz4 with chdb).

This PR is more for the discussion than the code itself.

This is an example of using the low native reader of clickhouse format. I didn't make benchmarch but I think it could help to optimize query (and memory usage if I find how to use lz4 on results).
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@blackrez blackrez marked this pull request as draft September 16, 2025 10:40
@auxten
Copy link
Member

auxten commented Sep 17, 2025

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance.
Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native.
However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost.
I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

@kafka1991
Copy link
Contributor

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance. Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native. However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost. I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

I think this is very meaningful. Just for Go binding, it's much more efficient than use the Parquet format (at least the server doesn't need an extra encoding from a native format to the Parquet format), and you can get better compatibility with the netive clickhouse-go client(Sometimes, the data types in the Parquet format can't fully express the data types in ClickHouse).

@wudidapaopao
Copy link
Contributor

By leveraging the native format and ch-go (or similar tools in other programming languages), this also helps chdb users more conveniently obtain the type and content of data at specific rows and columns in query results directly.

@blackrez
Copy link
Author

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance. Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native. However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost. I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

Also, for some use case where I don't need/want arrow or pandas dependancies on my applications and use native format.
Case 1 : serverless application, embedding arrow or pandas could be painful or take some extra space that could impact performance and price.
Case 2 : hellish python environment with too old/strange dependancies with arrow and pandas, chDB with arrow and pandas could break the environment and limit its usage.

In my opinion arrow and pandas are important but it should be optional.

@s0und0fs1lence
Copy link

@blackrez Let me understand your point. Your suggestion is to use the Native output format as the intermediate format between the ClickHouse engine and language bindings of chDB to speed up query performance. Currently, this approach may not be very useful for chDB Python. We plan to support direct read(done) and write of Pandas Dataframe and Arrow Table in the Python binding, which eliminates one serialization and deserialization process compared to the Native format. Theoretically, this will be faster than Native. However, this is a great proposal for the current language bindings of ClickHouse. Using the Native format can seamlessly embed chDB into the existing ClickHouse language drivers at a low cost. I suggest we first try this in the Java binding of chDB, and then in ch-go. @wudidapaopao @kafka1991 @s0und0fs1lence What do you think?

Using native format for go bindings could indeed improve performance, but i does not change much in term of serializing/deserializing the values from clickhouse to the process memory.
I'll work on it for the golang bindings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants