Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@fabriziodemaria
Copy link
Member

@fabriziodemaria fabriziodemaria commented Nov 9, 2016

Description

This PR aims at providing support for regional location for BigQuery (BQ) datasets.
More information about regional location here: https://cloud.google.com/bigquery/docs/managing_jobs_datasets_projects

The intended behaviour follows:

  • If a new BQ dataset is created when uploading a BQ table, a regional location can be specified by the user for that dataset (note that all tables within the dataset inherit the same location value). BQ datasets can also have unspecified regional location.
  • If the target BQ dataset is already available when uploading a new BQ table, the optional location parameter set by the user must be the same of the remote dataset or the upload will fail. If the remote dataset has unspecified regional location, no location parameter can be enforced by the user. If the client does not specify a regional location, the one of the existing dataset will be adopted.

Motivation and Context

Users should have control over the regional location for BigQuery datasets.

Have you tested this? If so, how?

I have tested this by uploading tables on a BigQuery project setup for testing.

TODO

  • Tests

@mention-bot
Copy link

@fabriziodemaria, thanks for your PR! By analyzing the history of the files in this pull request, we identified @mikekap, @DeaconDesperado and @mbruggmann to be potential reviewers.

@mrunesson
Copy link

👍 As you say tests is good to add. Suggest one test with location and also double check that without location is covered properly in existing tests.

tox.ini Outdated
cdh,hdp: hdfs>=2.0.4,<3.0.0
postgres: psycopg2<3.0
gcloud: google-api-python-client>=1.4.0,<2.0
gcloud: testfixtures
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an upper bound?

if dataset.location is not None:
fetched_location = response.get('location', '')
if not fetched_location:
fetched_location = 'undefined'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want 'undefined' as the default value then we should specify that instead of ''...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using proper Python None would seem even simpler.

from contrib import gcs_test
from nose.plugins.attrib import attr

from testfixtures import should_raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the new testfixtures dependency - nose has assertRaises


PROJECT_ID = gcs_test.PROJECT_ID
# In order to run this test, you should set your GCS/BigQuert project/bucket.
# Unfortunately there's no mock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be exciting to have automatically generated mocks to speed up local testing... In the manner of http://martinfowler.com/bliki/SelfInitializingFake.html or similar...

@ulzha ulzha merged commit 28b7266 into spotify:master Nov 15, 2016
@Tarrasch
Copy link
Contributor

@ulzha, did you consider using squash-merge to keep the history cleaner? In particular I think merge-commits like 9d567a4 have a pretty low valuable-history to messiness ratio.

@Tarrasch
Copy link
Contributor

@fabriziodemaria @mrunesson @ulzha, have you guys checked that the bulid works well after this is merged? I see some failures on the py27-gcloud tox environment (run tox -e py27-gcloud, but it requires some setups).

https://travis-ci.org/spotify/luigi/builds

Tarrasch added a commit that referenced this pull request Nov 24, 2016
I'm not sure, but I believe things started to break after #1917 got merged.
@ulzha
Copy link
Contributor

ulzha commented Nov 25, 2016

Yes, I saw a green status. Something maybe nondeterministic? Going to have to inspect...

Yes we do run py27-gcloud locally (in a Dockerized easily usable setup... which I hope we get to opensource soon...)

@Tarrasch
Copy link
Contributor

@ulzha, you're right about green status. I can explain why: Fabrizio opened this pull-request from his own fabriziodemaria/luigi repo, so Travis didn't unencrypt the gcs-credentials as it would have done if the PR was sent from spotify/luigi repository. I implemented this complicated machinery to work some months before leaving spotify. See 375a470 :)

Also, kudos on setting up the dockerized stuff! Making luigi builds more stable would be so awesome!

Tarrasch added a commit that referenced this pull request Nov 29, 2016
I'm not sure, but I believe things started to break after #1917 got merged.
xoob added a commit to xoob/luigi that referenced this pull request Jul 3, 2017
Restore the gcloud tests that were disabled in spotify#1917.
During local execution, py27-gcloud succeeds, while py34-gcloud fails.

To run locally:

    export GCS_TEST_PROJECT_ID=macro-mile-158613 \
      GCS_TEST_BUCKET=macro-mile-158613 \
      DATAPROC_TEST_PROJECT_ID=macro-mile-158613
    tox -e py27-gcloud
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants