Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@bgranvea
Copy link
Contributor

@bgranvea bgranvea commented Mar 8, 2019

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature

Short description:
This new data type allows to have columns with light aggregation in an AggregatingMergeTree. This can only be used with simple functions like any, anyLast, sum, min, max.

Detailed description:
A column of type SimpleAggregateFunction(anyLast,Float64) in an AggregatingMergeTree behaves like a standard Float64 column, but when rows are merged, the aggregate function anyLast will be applied and the Float64 result will be stored.
This allows better performance than the standard approach Null engine table + AggregatingMergeTree materialized view + aggregation finalization view in this simple case.

See #3852 for a complete description of our use case.

@bgranvea
Copy link
Contributor Author

bgranvea commented Mar 8, 2019

The implementation is different from what I initially proposed in #3852.

Wrapping an existing data (like it is done for Nullable and LowCardinality) was too complex because there are many places in the code where the type is directly tested (dynamic_cast<TypeDataNullable *>...). Moreover the new data type doesn't change the behavior of the wrapped type at all.

So my current solution is to use a DataTypeDomain as a way to add extra information on an existing DataType. To be compatible with the use of another DataTypeDomain for IPv4 and IPv6, I introduced a "chain" of domains.

@bgranvea
Copy link
Contributor Author

there are 2 failing tests, is it really related to my PR?

@bgranvea bgranvea force-pushed the simple_aggregate_function branch from e496fa4 to 160c333 Compare March 18, 2019 10:04
@alexey-milovidov alexey-milovidov self-requested a review March 25, 2019 22:03
@alexey-milovidov
Copy link
Member

The failing test is not related to your changes.

@bgranvea bgranvea force-pushed the simple_aggregate_function branch from ec5e8c5 to 42b07c5 Compare April 9, 2019 05:26
@bgranvea
Copy link
Contributor Author

bgranvea commented Apr 9, 2019

@alexey-milovidov I've rebased the pull request because there was a conflict. It is ready for reviewing.

@alexey-milovidov alexey-milovidov merged commit 8f8d2c0 into ClickHouse:master May 9, 2019
@alexey-milovidov
Copy link
Member

alexey-milovidov commented May 9, 2019

I've reviewed this PR. Sorry for the delay.

The code is mostly Ok, I have some minor complaints:

void setCustomization(DataTypeCustomDescPtr custom_desc_) const;
Better to return customized data type instead of modification of mutable member.

static const std::vector<String> supported_functions{"any", "anyLast", "min", "max", "sum"};

  • better to have the method in IAggregateFunction
  • heavy static initializers

boost::algorithm::join(supported_functions, ","),

  • missing whitespace (min,max -> min, max)

Also I have some considerations:

The class hierarchy looks more complex than needed. I cannot grasp it at the first glance. If we use only "Simple" variants of "IDataTypeCustom", do we need anything else? Why customization of text serialization and name comes separately? Probably it was already the issue with previous implementation of DataTypeDomain. It will be nice if you will try to remove as much code as possible. Don't hesitate to cut off and throw away the unneeded code in DataTypeDomain.

@abyss7 abyss7 added the pr-feature Pull request with new product feature label May 13, 2019
@bgranvea
Copy link
Contributor Author

Thank you for your review. Regarding your points:

  • I'm not very happy with the mutable member in IDataType either. But to get rid of that, I would have to change the creator of all existing datatypes to take extra parameters because they all can be wrapped in a SimpleAggregateFunction. This looks like very impacting.

  • I've separated the customization of text serialization and name in order to support data types like SimpleAggregateFunction('anyLast', IPv4): in this case, the base datatype is a UInt32 with custom text serialization functions (because of IPv4) and with a custom name containing SimpleAggregateFunction info.

I think it is however possible to simplify the class hierarchy by getting rid of the SimpleTextSerialization level.

I look at the other points and try to improve the code.

azat added a commit to azat/ClickHouse that referenced this pull request May 19, 2019
SimpleAggregateFunction do not pass arena to the
add_function -> getAddressOfAddFunction(), hence next crash happens:
  (gdb) bt
  #0  DB::Arena::alloc (size=64, this=0x0) at ../dbms/src/Common/Arena.h:124
  ClickHouse#1  DB::SingleValueDataString::changeImpl (this=0x7f97424a27d8, value=..., arena=0x0) at ../dbms/src/AggregateFunctions/AggregateFunctionMinMaxAny.h:274
  ClickHouse#2  0x0000000005ea5319 in DB::AggregateFunctionNullUnary<true>::add (arena=<optimized out>, row_num=<optimized out>, columns=<optimized out>, place=<optimized out>, this=<optimized out>) at ../dbms/src/AggregateFunctions/AggregateFunctionNull.h:43
  ClickHouse#3  DB::IAggregateFunctionHelper<DB::AggregateFunctionNullUnary<true> >::addFree (that=<optimized out>, place=<optimized out>, columns=<optimized out>, row_num=<optimized out>, arena=<optimized out>) at ../dbms/src/AggregateFunctions/IAggregateFunction.h:131
  ClickHouse#4  0x000000000679772f in DB::AggregatingSortedBlockInputStream::addRow (this=this@entry=0x7f982de19c00, cursor=...) at ../dbms/src/Common/AlignedBuffer.h:31
  ClickHouse#5  0x0000000006797faa in DB::AggregatingSortedBlockInputStream::merge (this=this@entry=0x7f982de19c00, merged_columns=..., queue=...) at ../dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp:140
  ClickHouse#6  0x0000000006798979 in DB::AggregatingSortedBlockInputStream::readImpl (this=0x7f982de19c00) at ../dbms/src/DataStreams/AggregatingSortedBlockInputStream.cpp:78
  ClickHouse#7  0x000000000622db55 in DB::IBlockInputStream::read (this=0x7f982de19c00) at ../dbms/src/DataStreams/IBlockInputStream.cpp:56
  ClickHouse#8  0x0000000006613bee in DB::MergeTreeDataMergerMutator::mergePartsToTemporaryPart (this=this@entry=0x7f97ec65e1a0, future_part=..., merge_entry=..., time_of_merge=<optimized out>, disk_reservation=<optimized out>, deduplicate=<optimized out>) at /usr/include/c++/8/bits/shared_ptr_base.h:1018
  ClickHouse#9  0x000000000658f7a4 in DB::StorageReplicatedMergeTree::tryExecuteMerge (this=0x7f97ec65b810, entry=...) at /usr/include/c++/8/bits/unique_ptr.h:342
  ClickHouse#10 0x00000000065940ab in DB::StorageReplicatedMergeTree::executeLogEntry (this=0x7f97ec65b810, entry=...) at ../dbms/src/Storages/StorageReplicatedMergeTree.cpp:910
  <snip>

  (gdb) f 1
  (gdb) p MAX_SMALL_STRING_SIZE
  $1 = 48
  (gdb) p capacity
  $2 = 64
  (gdb) p value
  $3 = {data = 0x7f97242fcbd0 "HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHH", size = 61}

v2: avoid leaking of allocated by Arena memory on the intermediate step

Fixes: 8f8d2c0 ("Merge pull request ClickHouse#4629 from bgranvea/simple_aggregate_function")
azat added a commit to azat-ch/clickhouse-driver that referenced this pull request May 21, 2019
azat added a commit to azat-ch/clickhouse-driver that referenced this pull request May 21, 2019
Please note that there is no clickhouse release that contains this
merge, hence this should not be merged right now, otherwise this will
break travis-ci tests (at least).

Refs: ClickHouse/ClickHouse#4629
azat added a commit to azat-ch/clickhouse-driver that referenced this pull request May 21, 2019
Please note that there is no clickhouse release that contains this
merge, hence this should not be merged right now, otherwise this will
break travis-ci tests (at least).

Refs: ClickHouse/ClickHouse#4629
azat added a commit to azat-ch/clickhouse-driver that referenced this pull request May 21, 2019
Please note that there is no clickhouse release that contains this
merge, hence this should not be merged right now, otherwise this will
break travis-ci tests (at least).

Refs: ClickHouse/ClickHouse#4629
azat added a commit to azat-ch/clickhouse-driver that referenced this pull request Jun 13, 2019
Requires: clickhouse-server 19.8.3.8
Refs: ClickHouse/ClickHouse#4629
azat added a commit to azat-ch/clickhouse-driver that referenced this pull request Jun 13, 2019
Requires: clickhouse-server 19.8.3.8
Refs: ClickHouse/ClickHouse#4629
azat added a commit to azat-ch/clickhouse-driver that referenced this pull request Jun 13, 2019
Requires: clickhouse-server 19.8.3.8
Refs: ClickHouse/ClickHouse#4629
@bgranvea bgranvea deleted the simple_aggregate_function branch June 19, 2019 15:18
kua added a commit to kua/clickhouse that referenced this pull request Dec 30, 2019
azat added a commit to azat/ClickHouse that referenced this pull request Apr 10, 2020
Someone may not want using something that is not documented (what a
crazy caution).

Follow-up for: ClickHouse#4629
farbodsalimi pushed a commit to segmentio/clickhouse-go that referenced this pull request May 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants