Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Semi-Join/Anti-Join does not support the inequality condition #4950

@Hzc492

Description

@Hzc492

What happens?

I'm testing the performance of DuckDB on TPC-H with large SF (i.e. SF>100). For Q21, DuckDB runs very slowly and consumes a lot of memory. I believe the reason is that the Semi-Join/Anti-Join implemented here does not support the inequality condition.

To Reproduce

TPC-H schema is used here, or you can create a simplified table like this:

create table lineitem (l_orderkey int, l_suppkey int);
insert into lineitem values (1,1),(1,2),(3,3),(4,5),(5,5),(6,5);

Consider a over-simplified version of Q21,

select * from lineitem l1 where exists (
    select * from lineitem l2 
    where 
        l2.l_orderkey = l1.l_orderkey
);

DuckDB generates a query plan like this:
image
That's cool and everything works well.

However, if I add an inequality condition (i.e. l2.l_suppkey <> l1.l_suppkey) like that in TPC-H Q21:

select * from lineitem l1 where exists (
    select * from lineitem l2 
    where 
        l2.l_orderkey = l1.l_orderkey 
        and l2.l_suppkey <> l1.l_suppkey
);

DuckDB generates a terrible query plan:
image
As you can see, there is an INNER JOIN ON TWO LINEITEM TABLE! It makes this query run slowly and consume a lot of memory.

It seems feasible to have Semi-Join/Anti-Join additionally support the inequality condition on existing implementations....? Make an additional judgment on the given inequality condition for every join results generated by the equation condition?

OS:

Linux

DuckDB Version:

0.5.1

DuckDB Client:

Shell

Full Name:

Aqua

Affiliation:

CAS

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions