Codestin Search App

findinpath · 2022-02-25T14:48:21Z

Problem description

Assume that we have two tables which contain a column with the same name:

    CREATE TABLE test.table_same_column_name_a (some_id string)
    CREATE TABLE test.table_same_column_name_b (some_id string)

Assume also that we have a view doing a select from the result of these table joined together and the name of the common column is used in a different case (some_id, SOME_ID):

   CREATE VIEW IF NOT EXISTS test.view_column_name_case_insensitive AS
       SELECT a.some_id
       FROM test.table_same_column_name_a a
       LEFT JOIN (SELECT trim(some_id) AS SOME_ID FROM test.table_same_column_name_b) b
             ON a.some_id = b.some_id
       WHERE a.some_id != ''

When translating in Coral from Hive, the following result is being obtained:

SELECT some_id
FROM (SELECT table_same_column_name_a.some_id, t.SOME_ID
FROM default.table_same_column_name_a
LEFT JOIN (SELECT TRIM(some_id) SOME_ID, CAST(TRIM(some_id) AS STRING) $f1
FROM default.table_same_column_name_b) t ON table_same_column_name_a.some_id = t.$f1) t0
WHERE t0.some_id <> ''

As it can be seen from the query above this leads to having an ambiguous column, because there is both "some_id" and "SOME_ID" which in Hive point to the same column.

Solution description

Pointing out to coral that Hive is a case insensitive system, solves the problem by creating unique names for the columns in the selection.

...
SELECT table_same_column_name_a.some_id, t.SOME_ID SOME_ID0
...

When joining tables that have the same column name
,irrespective of their case, make sure that the
names of the columns have unique names to avoid
the situation where the SQL statement created
contains ambiguous column names.

findepi · 2022-02-25T15:03:51Z

cc @wmoustafa

wmoustafa

Could you update the PR description with an example query (preferably simplest one), and the behavior before and after the patch?

Can we add a Hive to Rel conversion test?

wmoustafa · 2022-02-26T08:45:48Z

coral-spark/src/test/java/com/linkedin/coral/spark/TestUtils.java

+    run(driver, "create table table_same_column_name_a (some_id string)");
+    run(driver, "create table table_same_column_name_b (some_id string)");


Could you please follow the same casing convention as the rest of the test cases? (I see some are lowercase, but these seem to have slipped).

wmoustafa · 2022-02-26T08:54:34Z

coral-spark/src/test/java/com/linkedin/coral/spark/TestUtils.java

+    run(driver, "create table table_same_column_name_a (some_id string)");
+    run(driver, "create table table_same_column_name_b (some_id string)");
+    run(driver,
+        "create view if not exists view_column_name_case_insensitive as\n" + "    select a.some_id\n"


Could you simplify the view to the minimal one that satisfies the test case?

create table hive.test.table_a ( some_id string ); create table hive.test.table_b ( some_id string );

These 2 views will have the same issue:

-- Hive view version 1 create or replace view test.view_ab as select a.some_id from test.table_a a left join ( select trim(some_id) AS SOME_ID from test.table_b ) b on a.some_id = b.some_id where a.some_id != ''

-- Hive view version 2 create or replace view test.view_ab2 as select a.some_id from test.table_a a left join ( select some_id AS SOME_ID from test.table_b ) b on a.some_id = trim(b.some_id) where a.some_id != ''

The WHERE clause is needed in order to enforce the usage of the inner select. I don't have at hand a sample view easier to grasp than the ones above.

wmoustafa · 2022-02-26T08:59:23Z

coral-spark/src/test/java/com/linkedin/coral/spark/TestUtils.java

    run(driver, String.join("\n", "", "CREATE VIEW IF NOT EXISTS named_struct_view", "AS",
        "SELECT named_struct('abc', 123, 'def', 'xyz') AS named_struc", "FROM bar"));

+    run(driver, "create table table_same_column_name_a (some_id string)");


table_same_column_name_a -> duplicate_column_name.tableA (same for the other table)

I did the update for the spark tests.
However the other tests follow a slightly different naming scheme for the test tables. Do you want me to adjust the naming for the other tests (hive/trino) ?

ljfgem

@findinpath Thanks for this PR!
I have kicked off the i-test and will update once it finishes.
Could you please run ./gradlew spotlessApply to fix the code format issue? Please ensure that ./gradlew clean build succeeds before pushing.

ljfgem

LGTM and all the i-tests passed.
Thanks @findinpath @findepi !
@wmoustafa @funcheetah do you want to take another look?

In the translation of views, when joining tables that have the namesake column names ,irrespective of their case, make sure that the resulting relation is using names of the columns which have unique names in order to avoid the situation where the SQL statement created contains ambiguous column names.

funcheetah

Thanks @findinpath for the PR. LGTM.

coral-spark/src/test/java/com/linkedin/coral/spark/TestUtils.java

Fix typo in SQL statement

0f55d25

findepi approved these changes Feb 25, 2022

View reviewed changes

wmoustafa reviewed Feb 26, 2022

View reviewed changes

findinpath force-pushed the hive-case-insensitive branch 3 times, most recently from 66733bd to 1baab42 Compare February 28, 2022 11:21

findinpath requested a review from wmoustafa February 28, 2022 11:23

ljfgem reviewed Feb 28, 2022

View reviewed changes

findinpath force-pushed the hive-case-insensitive branch from 1baab42 to f1527ef Compare February 28, 2022 15:49

ljfgem approved these changes Feb 28, 2022

View reviewed changes

findinpath force-pushed the hive-case-insensitive branch from f1527ef to 6ca047f Compare March 1, 2022 06:08

findinpath mentioned this pull request Mar 1, 2022

Test translation of view with namesake column aliases which are differing in case trinodb/trino#11240

Merged

funcheetah approved these changes Mar 1, 2022

View reviewed changes

coral-spark/src/test/java/com/linkedin/coral/spark/TestUtils.java Show resolved Hide resolved

ljfgem merged commit a33149e into linkedin:master Mar 4, 2022

findinpath mentioned this pull request Mar 4, 2022

Bump Coral dependency version to 2.0.55 trinodb/trino#11159

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set the schema case sensitivity to false for Hive#234

Set the schema case sensitivity to false for Hive#234
ljfgem merged 2 commits intolinkedin:masterfrom
findinpath:hive-case-insensitive

findinpath commented Feb 25, 2022 •

edited

Loading

Uh oh!

findepi commented Feb 25, 2022

Uh oh!

wmoustafa left a comment

Uh oh!

wmoustafa Feb 26, 2022

Uh oh!

wmoustafa Feb 26, 2022

Uh oh!

findinpath Feb 28, 2022

Uh oh!

wmoustafa Feb 26, 2022

Uh oh!

findinpath Feb 28, 2022

Uh oh!

ljfgem left a comment

Uh oh!

ljfgem left a comment

Uh oh!

funcheetah left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

		run(driver, "create table table_same_column_name_a (some_id string)");
		run(driver, "create table table_same_column_name_b (some_id string)");

Conversation

findinpath commented Feb 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem description

Solution description

Uh oh!

findepi commented Feb 25, 2022

Uh oh!

wmoustafa left a comment

Choose a reason for hiding this comment

Uh oh!

wmoustafa Feb 26, 2022

Choose a reason for hiding this comment

Uh oh!

wmoustafa Feb 26, 2022

Choose a reason for hiding this comment

Uh oh!

findinpath Feb 28, 2022

Choose a reason for hiding this comment

Uh oh!

wmoustafa Feb 26, 2022

Choose a reason for hiding this comment

Uh oh!

findinpath Feb 28, 2022

Choose a reason for hiding this comment

Uh oh!

ljfgem left a comment

Choose a reason for hiding this comment

Uh oh!

ljfgem left a comment

Choose a reason for hiding this comment

Uh oh!

funcheetah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

findinpath commented Feb 25, 2022 •

edited

Loading