Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

kuriancheeramelil
Copy link
Contributor

@kuriancheeramelil kuriancheeramelil commented May 2, 2018

Auto generated sum/avg metrics in superset shows sum/avg expression as

AVG(`dataset`.`table`.`column`)
SUM(`dataset`.`table`.`column`)

which is syntactically wrong in BQ.

To fix it have overwritten the visit_column method of SQLCompiler in BigQueryCompiler to set include_table parameter which is true by default to false. When this is done, the sum expression becomes

SUM(`column`) which is correct.

Could you please check if the fix is good and merge it.

kuriancheeramelil and others added 2 commits May 2, 2018 16:10
@mxmzdlv
Copy link
Contributor

mxmzdlv commented May 3, 2018

Thank you @kuriancheeramelil
Unfortunately, the joins test (https://github.com/mxmzdlv/pybigquery/blob/master/test/test_sqlalchemy_bigquery.py#L199) fails with these changes.

The error:

E   sqlalchemy.exc.DatabaseError: (google.cloud.bigquery.dbapi.exceptions.DatabaseError) [{'reason': 'invalidQuery', 'location': 'query', 'message': 'Column name string is ambiguous at [2:72]'}] [SQL: 'SELECT `string` AS `test_pybigquery_sample_string`, count(`integer`) AS `count_1` \nFROM `test_pybigquery.sample` JOIN `test_pybigquery.sample_one_row` ON `string` = `string` GROUP BY `string`'] (Background on this error at: http://sqlalche.me/e/4xp6)

For joins we need to prefix the column name with the table name. Not sure what's the best way to deal with this.

@kuriancheeramelil
Copy link
Contributor Author

Thank you @mxmzdlv for looking into this.
Will check if this can be done without breaking joins.

@kuriancheeramelil
Copy link
Contributor Author

@mxmzdlv, could you please check if the alternate approach is good?

@mxmzdlv
Copy link
Contributor

mxmzdlv commented May 13, 2018

Awesome, this approach passes all the tests! Have you tried it with Superset — does it work without any issues?

@kuriancheeramelil
Copy link
Contributor Author

Its working with superset. The autogenerated metrics are syntactically correct as well now.

@mxmzdlv mxmzdlv merged commit 94f39a9 into googleapis:master May 14, 2018
@mxmzdlv
Copy link
Contributor

mxmzdlv commented May 14, 2018

Thanks!

@jimfulton
Copy link
Contributor

Hi @kuriancheeramelil

I'm working on getting 100% unit-test coverage and am wondering if this change is still needed.

I tried to test this in superset.

What I did in superset:

  • Added a dataset (a names dataset with name, gender, and count)
  • Grouped by gender
  • Added an average on count.

I tried both removing the lines and adding a break point in the if. Removing the lines has no impact. The break point never triggers, suggesting that Superset never calls this with a Column. That, or I don't know how to reproduce the problem. :)

Can you explain how you triggered the problem in Superset?

Is this still needed?

@kuriancheeramelil
Copy link
Contributor Author

Hi @jimfulton,

This fix was done for one of our customers at that time.
The sum/avg metrics were not getting generated for any of the Bigquery datasets. If you are not facing the issue now without this fix then it is no longer required.
Sorry, I am not able to personally confirm it since I don't have access to a GCP account at the moment to use BigQuery.

Thanks,
Kurian

@jimfulton
Copy link
Contributor

@kuriancheeramelil Thanks. Do you recall how you reproduced the issue at the time?

@kuriancheeramelil
Copy link
Contributor Author

kuriancheeramelil commented May 7, 2021

@jimfulton ,

At that time, the issue was there for all datasets pointing to BigQuery, maybe it was due to the specific version of google-cloud-bigquery we used at that time. The version we used was 0.28.0

I am also pasting below the Dockerfile which we used to build the superset docker image. If you could build the image and test using that you might be able to reproduce the issue.

FROM centos:7

# Superset version
ARG SUPERSET_VERSION=0.22.1

# Configure environment
ENV PYTHONPATH=/etc/superset/conf:$PYTHONPATH \
    SUPERSET_VERSION=${SUPERSET_VERSION} \
    PYBIGQUERY_VERSION=0.2.5 \
    BIGQUERY_VERSION=0.28.0 \
    SUPERSET_HOME=/home/superset

# Create superset user & install dependencies
RUN useradd -U -m superset && \
    yum upgrade -y python-setuptools && \
    yum install -y gcc gcc-c++ libffi-devel python-devel python-wheel openssl-devel libsasl2-devel openldap-devel mariadb-devel curl epel-release && \
    yum install -y python2-pip && \
    find / -name '*pip*' && \
    pip install --upgrade setuptools pip && \
    pip install superset==${SUPERSET_VERSION} pybigquery==${PYBIGQUERY_VERSION} mysqlclient flask_oauthlib google-cloud-bigquery==${BIGQUERY_VERSION} requests && \
    pip install pyasn1 pyasn1-modules --upgrade

COPY db_engine_specs.py /usr/lib/python2.7/site-packages/superset/db_engine_specs.py
COPY sqlalchemy_bigquery.py /usr/lib/python2.7/site-packages/pybigquery/sqlalchemy_bigquery.py

# Configure Filesystem
WORKDIR /home/superset

# Deploy application
EXPOSE 8088
HEALTHCHECK CMD ["curl", "-f", "http://localhost:8088/health"]
ENTRYPOINT ["superset"]
CMD ["runserver","-t","120","-w","14"]
USER superset

Also attaching the files which the Dockerfile references

Archive.zip

Thanks
Kurian

@jimfulton
Copy link
Contributor

jimfulton commented May 7, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants