0% found this document useful (0 votes)

16 views2 pages

Bigquery

Uploaded by

SECE20A39MRUNAL VAIDYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views2 pages

Bigquery

Uploaded by

SECE20A39MRUNAL VAIDYA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 2

Bigquery

Legacy vs Standard Sql

Legacy sql – [], udf available in web console. Tables use “:” as separator

Standard sql – backtick is used, separator is . does not support TABLE_DATE_RANGE and
TABLE_QUERY. Can be overcome using wildcard and table_suffix. Supports querying nested and
repeated data.

Standard sql advantages:

▪ Composability using WITH clauses and SQL functions.

▪ Subqueries in the SELECT list and WHERE clause.

▪ Correlated subqueries

▪ ARRAY and STRUCT data types (legacy had repeated and record data types)

▪ Inserts, updates, and deletes (dml)

▪ COUNT(DISTINCT <expr>) is exact and scalable, providing the accuracy of

EXACT_COUNT_DISTINCT without its limitations

▪ Automatic predicate push-down through JOINs

▪ Complex JOIN predicates, including arbitrary expressions

▪ Table wildcards, table_suffix

▪ Stricter timestamp checking

Best practises/Performance

▪ Avoid self-joins, use window function instead

▪ If data is skewed like some partitions are huge, filter early. Use approximate_top_count to
determine skew

▪ Avoid joins that produces more output rows than input

▪ Avoid point specific dml. Batch the dml statements

▪ Sub-queries are more efficient than joins

▪ Avoid self-joins, use window function instead

▪ Use only columns that are needed

▪ Filter using “WHERE” clause so that there are minimal rows

▪ With joins, do bigger joins first. Left side of join must be the bigger table

▪ Low cardinality “by groups” are faster. Low cardinality means that the column contains a lot
of “repeats” in its data range

▪ LIMIT doesnt affect cost as it controls only the display

▪ Built-in functions are faster than js udf

▪ Exact functions are slower than approximate built-in function, use approximate built-in if
possible. For example, instead of using COUNT(DISTINCT), use APPROX_COUNT_DISTINCT()

▪ Ordering on outermost query, not inner. Outer query is performed last, so put complex
operations in the end when all filtering is done.

▪ Wildcards – be more specific if possible

▪ Performance – query time split between stages, can be seen using stackdriver as well.

▪ Each stage – wait, read, write, compute

▪ Tail skew – max time spent is significantly more than average. Some partitions are way bigger
than other partitions. Tail skew can be found out using approximate aggregate function like
APPROX_TOP_COUNT

▪ Avoid tail skew – filter as early as possible

▪ Batch load is free, streaming has a cost. Unless data is needed in real-time, use batch when
possible.

▪ Denormalize when possible. Still use structs and arrays.

▪ External data sources are slow, use it only when needed.

▪ Monitor query performance – using “details” page. Can find out if there is read, compute or
write latency. Query plan shows different stages and shows breakup of time between
different activities in a stage

BigQuery CheatSheet
No ratings yet
BigQuery CheatSheet
100 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
(Ebook) The Art of Postgresql: Turn Thousands of Lines of Code Into Simple Queries by Dimitri Fontaine
No ratings yet
(Ebook) The Art of Postgresql: Turn Thousands of Lines of Code Into Simple Queries by Dimitri Fontaine
65 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
Crack Your Data Engineering SQL Round
No ratings yet
Crack Your Data Engineering SQL Round
112 pages
Day 10 1729086189
No ratings yet
Day 10 1729086189
14 pages
SQL to Azure Analytics Guide
100% (3)
SQL to Azure Analytics Guide
2 pages
Advance SQL Deck 1749614960
No ratings yet
Advance SQL Deck 1749614960
76 pages
Data Engineering Interview Question and Ans Chatgpt
No ratings yet
Data Engineering Interview Question and Ans Chatgpt
21 pages
Apache Spark - DataFrames and Spark SQL
100% (2)
Apache Spark - DataFrames and Spark SQL
146 pages
BigQuery SQL Optimization Guide
No ratings yet
BigQuery SQL Optimization Guide
27 pages
جودة المواقع PDF
No ratings yet
جودة المواقع PDF
25 pages
Part1 MColgan Understanding The Optimizer
No ratings yet
Part1 MColgan Understanding The Optimizer
27 pages
Top 100 Must Know SQL Queries
No ratings yet
Top 100 Must Know SQL Queries
10 pages
Advanced SQL Concepts
No ratings yet
Advanced SQL Concepts
38 pages
SRQ 6 and 7 Strucutre and Model Sample
No ratings yet
SRQ 6 and 7 Strucutre and Model Sample
20 pages
Data Stream Management
No ratings yet
Data Stream Management
46 pages
Auditing
No ratings yet
Auditing
54 pages
Data Analyst Cheat Sheet
No ratings yet
Data Analyst Cheat Sheet
28 pages
BigQuery SQL Cheat Sheet Visual
No ratings yet
BigQuery SQL Cheat Sheet Visual
1 page
TT SQL Cheat Sheet
No ratings yet
TT SQL Cheat Sheet
7 pages
SQL Concepts Differences
No ratings yet
SQL Concepts Differences
3 pages
Advanced Database Ch2 and 3
100% (1)
Advanced Database Ch2 and 3
73 pages
SQL and PySpark Interview Questions
No ratings yet
SQL and PySpark Interview Questions
15 pages
GMP Dolphin-G Series
0% (1)
GMP Dolphin-G Series
1 page
Spark DataFrame Best Practices
No ratings yet
Spark DataFrame Best Practices
10 pages
Study Guide CheatSheet SQL Basics v1
No ratings yet
Study Guide CheatSheet SQL Basics v1
12 pages
Analytics Databases - A Comparative Study
No ratings yet
Analytics Databases - A Comparative Study
62 pages
Big SQL Performance Troubleshooting
No ratings yet
Big SQL Performance Troubleshooting
22 pages
Loading and Exporting Data
No ratings yet
Loading and Exporting Data
2 pages
SQL CheatSheet
No ratings yet
SQL CheatSheet
4 pages
SQL Cheat Sheet PDF
No ratings yet
SQL Cheat Sheet PDF
2 pages
SQL Cheat Sheet
No ratings yet
SQL Cheat Sheet
2 pages
Query Execution for DB Engineers
No ratings yet
Query Execution for DB Engineers
25 pages
PDF Handbook of Pharmaceutical Manufacturing Formulations, Third Edition-Volume Four, Semisolid Products Sarfaraz K. Niazi (Author) Download
100% (3)
PDF Handbook of Pharmaceutical Manufacturing Formulations, Third Edition-Volume Four, Semisolid Products Sarfaraz K. Niazi (Author) Download
53 pages
Sage X3 Server Sizing Guide
No ratings yet
Sage X3 Server Sizing Guide
6 pages
Shopee Delivery Po Pra Sa Kabaong Ni Don
No ratings yet
Shopee Delivery Po Pra Sa Kabaong Ni Don
65 pages
Data Analysts' Guide to BigQuery & Tableau
No ratings yet
Data Analysts' Guide to BigQuery & Tableau
14 pages
Principles of Marketing: Developing New Products and Managing The Product Life Cycle
No ratings yet
Principles of Marketing: Developing New Products and Managing The Product Life Cycle
35 pages
CH 19 Sum
No ratings yet
CH 19 Sum
8 pages
4 - Spark SQL
No ratings yet
4 - Spark SQL
58 pages
Unit 4 Bank Deposits and Lending
No ratings yet
Unit 4 Bank Deposits and Lending
30 pages
School Space Allocation Guide
No ratings yet
School Space Allocation Guide
5 pages
SQL To Analytics Language Cheat Sheet
No ratings yet
SQL To Analytics Language Cheat Sheet
2 pages
RRU5903 (850Mhz) - Technical Specifications
No ratings yet
RRU5903 (850Mhz) - Technical Specifications
8 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
Beginners Guide To SQL
No ratings yet
Beginners Guide To SQL
32 pages
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
No ratings yet
Beam Telecom PVT LTD.: 8-2-610/A, Road No.10, Banjara Hills, Hyderabad-500034 Tel: +91-40-66272727
2 pages
Case Study
No ratings yet
Case Study
9 pages
W5 Iccii Lab Physical Synthesis
No ratings yet
W5 Iccii Lab Physical Synthesis
16 pages
AC Service Unit: Repair Instructions
100% (1)
AC Service Unit: Repair Instructions
29 pages
Data Engineering For Beginners
No ratings yet
Data Engineering For Beginners
129 pages
Advance SQL
No ratings yet
Advance SQL
12 pages
GSCH003 - Rev04 24.11.2021
No ratings yet
GSCH003 - Rev04 24.11.2021
55 pages
Cheet Sheet Abhishek
No ratings yet
Cheet Sheet Abhishek
10 pages
TN206
No ratings yet
TN206
37 pages
SQL - Eda Process
No ratings yet
SQL - Eda Process
7 pages
SQL Indexes - Advanced SQL - Bipp Analytics
No ratings yet
SQL Indexes - Advanced SQL - Bipp Analytics
5 pages
Databases LEVEL 3 Notes
No ratings yet
Databases LEVEL 3 Notes
29 pages
Unstructured Data: User Price Shipped
No ratings yet
Unstructured Data: User Price Shipped
14 pages
Data Science Tools Study Guides For MIT's 15.003
No ratings yet
Data Science Tools Study Guides For MIT's 15.003
23 pages
BigQuery Data Engineer Interview CheatSheet
No ratings yet
BigQuery Data Engineer Interview CheatSheet
4 pages
From Data To Insights Course Summary
No ratings yet
From Data To Insights Course Summary
67 pages
BTK - A318 - A319 - A320 - A321 - AMM - 01-Feb-2020 - J. AIDS MCDU Functions
No ratings yet
BTK - A318 - A319 - A320 - A321 - AMM - 01-Feb-2020 - J. AIDS MCDU Functions
44 pages
Brand Ambassador Playbook Roster
No ratings yet
Brand Ambassador Playbook Roster
27 pages
Middle East Real Estate Predictions - Dubai
No ratings yet
Middle East Real Estate Predictions - Dubai
28 pages
Hartley Oscillator
No ratings yet
Hartley Oscillator
4 pages
Review Of: Generated On 2022-12-20
No ratings yet
Review Of: Generated On 2022-12-20
21 pages
SQL For Data Science
No ratings yet
SQL For Data Science
8 pages
FANAS 7e PPT Chap02
No ratings yet
FANAS 7e PPT Chap02
17 pages
IELTS Listening Test 122
No ratings yet
IELTS Listening Test 122
6 pages
4th Sem Exam Fees Paid Yogi
No ratings yet
4th Sem Exam Fees Paid Yogi
1 page
SQL Query Optimization Tips
No ratings yet
SQL Query Optimization Tips
9 pages
Industrial Users Design Thinking
No ratings yet
Industrial Users Design Thinking
3 pages
Oilfield Chemical Solutions
No ratings yet
Oilfield Chemical Solutions
13 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
OD429516601930181100
No ratings yet
OD429516601930181100
1 page
Epie Vs Ulat-Marredo
No ratings yet
Epie Vs Ulat-Marredo
1 page
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
No ratings yet
Why Postgresql For Analytics Infrastructure (DW) ?: Huy Nguyen Cto, Cofounder - Holistics - Io
50 pages
GCP Data Storage & BigQuery Guide
No ratings yet
GCP Data Storage & BigQuery Guide
15 pages
SQL Tutorial1
No ratings yet
SQL Tutorial1
25 pages
2022 Q1 OKR Update Supply Chain Tech Update
No ratings yet
2022 Q1 OKR Update Supply Chain Tech Update
5 pages
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
No ratings yet
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
25 pages
Data Science Tools Guide: SQL, R, Python
No ratings yet
Data Science Tools Guide: SQL, R, Python
23 pages
r23 Dbms Record
No ratings yet
r23 Dbms Record
8 pages
Wipro Data Analyst Interview Questions
No ratings yet
Wipro Data Analyst Interview Questions
29 pages
Big Query Optimization Document
No ratings yet
Big Query Optimization Document
10 pages
BigQuery Cost Optimization + Best Practices
No ratings yet
BigQuery Cost Optimization + Best Practices
30 pages

Bigquery

Uploaded by

Bigquery

Uploaded by

Bigquery

Legacy vs Standard Sql

Standard sql advantages:

▪ Composability using WITH clauses and SQL functions.

▪ Subqueries in the SELECT list and WHERE clause.

▪ Inserts, updates, and deletes (dml)

▪ COUNT(DISTINCT <expr>) is exact and scalable, providing the accuracy of

▪ Automatic predicate push-down through JOINs

▪ Complex JOIN predicates, including arbitrary expressions

▪ Table wildcards, table_suffix

▪ Stricter timestamp checking

▪ Avoid self-joins, use window function instead

▪ Avoid joins that produces more output rows than input

▪ Avoid point specific dml. Batch the dml statements

▪ Sub-queries are more efficient than joins

▪ Avoid self-joins, use window function instead

▪ Use only columns that are needed

▪ Filter using “WHERE” clause so that there are minimal rows

▪ LIMIT doesnt affect cost as it controls only the display

▪ Built-in functions are faster than js udf

▪ Wildcards – be more specific if possible

▪ Each stage – wait, read, write, compute

▪ Avoid tail skew – filter as early as possible

▪ Denormalize when possible. Still use structs and arrays.

▪ External data sources are slow, use it only when needed.

You might also like