-
Notifications
You must be signed in to change notification settings - Fork 5
feat(spare/tableau): Ingest, flatten, and upload Spare resources to tableau #547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
126f959 to
82e3697
Compare
LCOV of commit
|
…rom s3, extracts the schema, and then renames fields that are invalid for tableau, and returns that schema
…_schema" that operates on schemas and not tables. this might not be used...or be possible. wip
…rform operations like flatten. this is generic, can do anything
…hat dont upload to tableau, but need to address
…tableau_schema method
…ment all the column types to handle
a95c7be to
25ef982
Compare
LCOV of commit
|
LCOV of commit
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
break out lamp jobs from spare jobs to stay organized
LCOV of commit
|
LCOV of commit
|
LCOV of commit
|
skyqrose
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the code in depth, but based on talking to you out of band and a spot check of the data in Tableau, 👍.
Thanks for taking care of this, I would not have been able to do it myself.
Prototype import spare to dev
Import all spare to dev/prod
Related Devops/Terraform:
https://github.com/mbta/terraform_modules/pull/287
https://github.com/mbta/terraform_modules/pull/289
https://github.com/mbta/terraform_modules/pull/295
https://github.com/mbta/devops/pull/3042
Why do we need changes?
Spare/Paratransit tech is reusing the existing Tableau architecture developed by LAMP to upload data to tableau for analysts. all spare data is in the
/sparepartition of the samedataplatformbuckets, and require specific permissions to access (as implemented in the devops tickets above.What changes does this PR propose?
The core work here is to generically convert and flatten the files into a tableau hyperapi compliant shape. A simple way to do this is to simply convert everything to strings,
but here we added a utility to flatten/explode/convert all fields into a fully flat structure.Based on Comments, implemented just string flattening as a reasonable default. This is reusable and generic for other Tableau uploading tasks.We also autogenerated the input/output and processing methods for each of these inputs, so adding a new source is simply adding it to a whilelist, and it will get processed and uploaded.
How were these changes validated?
Tested locally, tested main conversion method, tested in staging (since there is no environment to test in via dev...)
What questions should reviewers consider?
Limitations:
We removed the ability to do "custom" conversions for now as they were not necessary, but an improvement in the future may be to add includes/excludes, or as requested leave structs as just a "string" if it causes analysis issues by being too exploded/deep.
Another thing to watch is the runtime of this, as we've added ~46 new resources to upload, and this all runs in the Tableau cron job - which runs every hour.