-
Notifications
You must be signed in to change notification settings - Fork 7.9k
SimpleAggregateFunction data type #4629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleAggregateFunction data type #4629
Conversation
|
The implementation is different from what I initially proposed in #3852. Wrapping an existing data (like it is done for Nullable and LowCardinality) was too complex because there are many places in the code where the type is directly tested (dynamic_cast<TypeDataNullable *>...). Moreover the new data type doesn't change the behavior of the wrapped type at all. So my current solution is to use a DataTypeDomain as a way to add extra information on an existing DataType. To be compatible with the use of another DataTypeDomain for IPv4 and IPv6, I introduced a "chain" of domains. |
|
there are 2 failing tests, is it really related to my PR? |
e496fa4 to
160c333
Compare
|
The failing test is not related to your changes. |
ec5e8c5 to
42b07c5
Compare
|
@alexey-milovidov I've rebased the pull request because there was a conflict. It is ready for reviewing. |
|
I've reviewed this PR. Sorry for the delay. The code is mostly Ok, I have some minor complaints:
Also I have some considerations: The class hierarchy looks more complex than needed. I cannot grasp it at the first glance. If we use only "Simple" variants of "IDataTypeCustom", do we need anything else? Why customization of text serialization and name comes separately? Probably it was already the issue with previous implementation of DataTypeDomain. It will be nice if you will try to remove as much code as possible. Don't hesitate to cut off and throw away the unneeded code in DataTypeDomain. |
|
Thank you for your review. Regarding your points:
I think it is however possible to simplify the class hierarchy by getting rid of the SimpleTextSerialization level. I look at the other points and try to improve the code. |
SimpleAggregateFunction do not pass arena to the add_function -> getAddressOfAddFunction(), hence next crash happens: (gdb) bt #0 DB::Arena::alloc (size=64, this=0x0) at ../dbms/src/Common/Arena.h:124 ClickHouse#1 DB::SingleValueDataString::changeImpl (this=0x7f97424a27d8, value=..., arena=0x0) at ../dbms/src/AggregateFunctions/AggregateFunctionMinMaxAny.h:274 ClickHouse#2 0x0000000005ea5319 in DB::AggregateFunctionNullUnary<true>::add (arena=<optimized out>, row_num=<optimized out>, columns=<optimized out>, place=<optimized out>, this=<optimized out>) at ../dbms/src/AggregateFunctions/AggregateFunctionNull.h:43 ClickHouse#3 DB::IAggregateFunctionHelper<DB::AggregateFunctionNullUnary<true> >::addFree (that=<optimized out>, place=<optimized out>, columns=<optimized out>, row_num=<optimized out>, arena=<optimized out>) at ../dbms/src/AggregateFunctions/IAggregateFunction.h:131 ClickHouse#4 0x000000000679772f in DB::AggregatingSortedBlockInputStream::addRow (this=this@entry=0x7f982de19c00, cursor=...) at ../dbms/src/Common/AlignedBuffer.h:31 ClickHouse#5 0x0000000006797faa in DB::AggregatingSortedBlockInputStream::merge (this=this@entry=0x7f982de19c00, merged_columns=..., queue=...) at ../dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp:140 ClickHouse#6 0x0000000006798979 in DB::AggregatingSortedBlockInputStream::readImpl (this=0x7f982de19c00) at ../dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp:78 ClickHouse#7 0x000000000622db55 in DB::IBlockInputStream::read (this=0x7f982de19c00) at ../dbms/src/DataStreams/IBlockInputStream.cpp:56 ClickHouse#8 0x0000000006613bee in DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart (this=this@entry=0x7f97ec65e1a0, future_part=..., merge_entry=..., time_of_merge=<optimized out>, disk_reservation=<optimized out>, deduplicate=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1018 ClickHouse#9 0x000000000658f7a4 in DB::StorageReplicatedMergeTree::tryExecuteMerge (this=0x7f97ec65b810, entry=...) at /usr/include/c++/8/bits/unique_ptr.h:342 ClickHouse#10 0x00000000065940ab in DB::StorageReplicatedMergeTree::executeLogEntry (this=0x7f97ec65b810, entry=...) at ../dbms/src/Storages/StorageReplicatedMergeTree.cpp:910 <snip> (gdb) f 1 (gdb) p MAX_SMALL_STRING_SIZE $1 = 48 (gdb) p capacity $2 = 64 (gdb) p value $3 = {data = 0x7f97242fcbd0 "HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH", size = 61} v2: avoid leaking of allocated by Arena memory on the intermediate step Fixes: 8f8d2c0 ("Merge pull request ClickHouse#4629 from bgranvea/simple_aggregate_function")
Please note that there is no clickhouse release that contains this merge, hence this should not be merged right now, otherwise this will break travis-ci tests (at least). Refs: ClickHouse/ClickHouse#4629
Please note that there is no clickhouse release that contains this merge, hence this should not be merged right now, otherwise this will break travis-ci tests (at least). Refs: ClickHouse/ClickHouse#4629
Please note that there is no clickhouse release that contains this merge, hence this should not be merged right now, otherwise this will break travis-ci tests (at least). Refs: ClickHouse/ClickHouse#4629
Requires: clickhouse-server 19.8.3.8 Refs: ClickHouse/ClickHouse#4629
Requires: clickhouse-server 19.8.3.8 Refs: ClickHouse/ClickHouse#4629
Requires: clickhouse-server 19.8.3.8 Refs: ClickHouse/ClickHouse#4629
Someone may not want using something that is not documented (what a crazy caution). Follow-up for: ClickHouse#4629
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog. Remove if this is non-significant change.
Category (leave one):
Short description:
This new data type allows to have columns with light aggregation in an AggregatingMergeTree. This can only be used with simple functions like any, anyLast, sum, min, max.
Detailed description:
A column of type SimpleAggregateFunction(anyLast,Float64) in an AggregatingMergeTree behaves like a standard Float64 column, but when rows are merged, the aggregate function anyLast will be applied and the Float64 result will be stored.
This allows better performance than the standard approach Null engine table + AggregatingMergeTree materialized view + aggregation finalization view in this simple case.
See #3852 for a complete description of our use case.