Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views53 pages

Snowflake JOINS QueryOptimization

The document provides an introduction to Snowflake SQL, covering various types of joins, subquerying, common table expressions (CTEs), and query optimization techniques. It also discusses handling semi-structured data, specifically JSON, and includes examples of querying and manipulating this data type in Snowflake. Additionally, it highlights the importance of optimizing queries for performance and cost efficiency.

Uploaded by

sakshamdura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views53 pages

Snowflake JOINS QueryOptimization

The document provides an introduction to Snowflake SQL, covering various types of joins, subquerying, common table expressions (CTEs), and query optimization techniques. It also discusses handling semi-structured data, specifically JSON, and includes examples of querying and manipulating this data type in Snowflake. Additionally, it highlights the importance of optimizing queries for performance and cost efficiency.

Uploaded by

sakshamdura
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

Joining in Snowflake

INTRODUCTION TO SNOWFLAKE SQL

George Boorman
Senior Curriculum Manager, DataCamp
JOINS
INNER JOIN
OUTER JOINS
LEFT OUTER JOIN or LEFT JOIN

RIGHT OUTER JOIN or RIGHT JOIN

FULL OUTER JOIN or FULL JOIN

CROSS JOINS

SELF JOINS

NATURAL JOIN

LATERAL JOIN

INTRODUCTION TO SNOWFLAKE SQL


Pizza dataset

INTRODUCTION TO SNOWFLAKE SQL


NATURAL JOIN
NATURAL JOIN automatically match columns and eliminate duplicated ones

Syntax:

SELECT ...
FROM <table_one> [
{
| NATURAL [ { LEFT | RIGHT | FULL } [ OUTER ] ]
}
]
JOIN <table_two>
[ ... ]

INTRODUCTION TO SNOWFLAKE SQL


NATURAL JOIN
Without NATURAL JOIN With NATURAL JOIN

SELECT * SELECT *
FROM pizzas AS p FROM pizzas AS p
JOIN pizza_type AS t NATURAL JOIN pizza_type AS t
ON t.pizza_type_id = p.pizza_type_id

INTRODUCTION TO SNOWFLAKE SQL


NATURAL JOIN

NOT ALLOWED

select *
FROM pizzas AS p
NATURAL JOIN pizza_type AS t
ON t.pizza_type_id = p.pizza_type_id

INTRODUCTION TO SNOWFLAKE SQL


NATURAL JOIN

ALLOWED

WHERE clause

SELECT *
FROM pizzas AS p
NATURAL JOIN pizza_type AS t
WHERE pizza_type_id = 'bbq_ckn'

INTRODUCTION TO SNOWFLAKE SQL


LATERAL JOIN
LATERAL JOIN : lets a subquery in FROM reference columns from preceding tables or views.

Syntax:

SELECT ...
FROM <left_hand_expression> , --
LATERAL
(<right_hand_expression>)

left_hand_expression - Table, view, or subquery

right_hand_expression - Inline view or subquery

INTRODUCTION TO SNOWFLAKE SQL


LATERAL JOIN with a subquery
SELECT
p.pizza_id,
lat.name,
lat.category
FROM pizzas AS p,
LATERAL -- Keyword LATERAL
( SELECT *
FROM pizza_type AS t
-- Referencing outer query column: p.pizza_type_id
WHERE p.pizza_type_id = t.pizza_type_id
) AS lat

INTRODUCTION TO SNOWFLAKE SQL


Why LATERAL JOIN?
SELECT
*
FROM orders AS o,
LATERAL (
-- Subquery calculating total_spent
SELECT
SUM(p.price * od.quantity) AS total_spent
FROM order_details AS od
JOIN pizzas AS p
ON od.pizza_id = p.pizza_id
WHERE o.order_id = od.order_id
) AS t
ORDER BY o.order_id

INTRODUCTION TO SNOWFLAKE SQL


Let's practice!
INTRODUCTION TO SNOWFLAKE SQL
Subquerying and
Common Table
Expressions
INTRODUCTION TO SNOWFLAKE SQL

George Boorman
Senior Curriculum Manager, DataCamp
Subquerying
Nested queries
Used in FROM , WHERE , HAVING or SELECT clauses

Example:

SELECT column1
FROM table1
WHERE column1 = (SELECT column2 FROM table2 WHERE condition)

Types: Correlated and uncorrelated subqueries

INTRODUCTION TO SNOWFLAKE SQL


Uncorrelated subquery
-- Main query returns pizzas priced at the maximum value found in the subquery
SELECT pizza_id
FROM pizzas
-- Uncorrelated subquery that identifies the highest pizza price
WHERE price = (
SELECT MAX(price)
FROM pizzas
)

Subquery doesn't interact with the main query

INTRODUCTION TO SNOWFLAKE SQL


Correlated subquery
Subquery references columns from the main query

SELECT pt.name,
pz.price,
pt.category
FROM pizzas AS pz
JOIN pizza_type AS pt
ON pz.pizza_type_id = pt.pizza_type_id
WHERE pz.price = (
-- Identifies max price for each pizza category
SELECT MAX(p2.price) -- Max price
FROM pizzas AS p2
WHERE -- Correlated: uses outer query column
p2.pizza_type_id = pz.pizza_type_id
)

INTRODUCTION TO SNOWFLAKE SQL


Common Table Expressions
General Syntax:

-- WITH keyword
WITH cte1 AS ( -- CTE name
SELECT col_1, col_2
FROM table1
)
...
SELECT ...
FROM cte1 -- Query CTE
;

INTRODUCTION TO SNOWFLAKE SQL


Common Table Expressions
WITH max_price AS ( -- CTE called max_price
SELECT pizza_type_id,
MAX(price) AS max_price
FROM pizzas
GROUP BY pizza_type_id
)
-- Main query
SELECT pt.name,
pz.price,
pt.category
FROM pizzas AS pz
JOIN pizza_type AS pt ON pz.pizza_type_id = pt.pizza_type_id
JOIN max_price AS mp -- Joining with CTE max_price
ON pt.pizza_type_id = mp.pizza_type_id
WHERE pz.price < mp.max_price -- Compare the price with max_price CTE column

INTRODUCTION TO SNOWFLAKE SQL


Multiple CTEs
-- Define multiple CTEs separated by commas
WITH cte1 AS (
SELECT ...
FROM ...
),
cte2 AS (
SELECT ...
FROM ...
)
-- Main query combining both CTEs
SELECT ...
FROM cte1
JOIN cte2 ON ...
WHERE ...

INTRODUCTION TO SNOWFLAKE SQL


Why Use CTEs?
Managing complex operations
Modular

Readable

Reusable

INTRODUCTION TO SNOWFLAKE SQL


Let's practice!
INTRODUCTION TO SNOWFLAKE SQL
Snowflake Query
Optimization
INTRODUCTION TO SNOWFLAKE SQL

George Boorman
Senior Curriculum Manager, DataCamp
Why Optimize Queries in Snowflake?
Achieve faster results
Cost efficiency
Shorter query times consumes fewer resources like CPU and memory.

INTRODUCTION TO SNOWFLAKE SQL


Common query problems
Exploding Joins: Be cautious!
Incorrect

SELECT *
FROM order_details AS od
JOIN pizzas AS p -- Missing ON condition leading to exploding joins

INTRODUCTION TO SNOWFLAKE SQL


Common query problems
Exploding Joins: Be cautious!
Correct

SELECT *
FROM order_details AS od
JOIN pizzas AS p
ON od.pizza_id = p.pizza_id

INTRODUCTION TO SNOWFLAKE SQL


Common query problems
UNION or UNION ALL : Know the difference
UNION removes duplicates, slows down the query

UNION ALL is faster if no duplicates

Handling big data


Use filters to narrow down data

Apply limits for quicker results

INTRODUCTION TO SNOWFLAKE SQL


How to optimize queries?
SELECT * SELECT LIMIT 10* ⚡
SELECT SELECT *
* FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS LIMIT 10

INTRODUCTION TO SNOWFLAKE SQL


How to optimize queries?
Using SELECT * Avoid SELECT * ⚡
SELECT SELECT o_orderdate,
* o_orderstatus
FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS FROM SNOWFLAKE_SAMPLE_DATA.TPCH_SF100.ORDERS

INTRODUCTION TO SNOWFLAKE SQL


How to optimize queries?
Filter Early
Use WHERE Clause Early On

Apply filters before JOIN s


JOIN will process fewer rows

INTRODUCTION TO SNOWFLAKE SQL


Without early filtering
SELECT orders.order_id,
orders.order_date,
pizza_type.name,
pizzas.pizza_size
FROM orders
JOIN order_details
ON orders.order_id = order_details.order_id
JOIN pizzas
ON order_details.pizza_id = pizzas.pizza_id
JOIN pizza_type
ON pizzas.pizza_type_id = pizza_type.pizza_type_id
WHERE orders.order_date = '2015-01-01'; -- Filtering after JOIN

INTRODUCTION TO SNOWFLAKE SQL


With early filtering
WITH filtered_orders AS (
SELECT *
FROM orders
WHERE order_date = '2015-01-01' -- Filtering in CTE before JOIN
)
SELECT filtered_orders.order_id,
filtered_orders.order_date,
pizza_type.name,
pizzas.pizza_size
FROM filtered_orders -- Joining with CTE
JOIN order_details
ON filtered_orders.order_id = order_details.order_id
JOIN pizzas
ON order details pizza id = pizzas pizza id

INTRODUCTION TO SNOWFLAKE SQL


Query history
Query History
snowflake.account_usage.query_history

Query History provides different metrics such as execution time

SELECT query_text, start_time, end_time, execution_time


FROM
snowflake.account_usage.query_history
WHERE query_text ILIKE '%order_details%'

ILIKE : Case-insensitive string-matching

INTRODUCTION TO SNOWFLAKE SQL


Query history
Spot slow or frequently running queries

SELECT query_text,
start_time,
end_time,
execution_time
FROM
snowflake.account_usage.query_history
WHERE
execution_time > 1000

INTRODUCTION TO SNOWFLAKE SQL


Let's practice!
INTRODUCTION TO SNOWFLAKE SQL
Handling semi-
structured data
INTRODUCTION TO SNOWFLAKE SQL

George Boorman
Senior Curriculum Manager, DataCamp
Structured versus semi-structured
Example of structured data Example of semi-structured data

| cust_id | cust_name | cust_age | cust_email |


|---------|-----------|----------|-----------------------|
| 1 | cust1 | 40 | cust1***@gmail.com |
| 2 | cust2 | 35 | cust2***@gmail.com |
| 3 | cust3 | 42 | cust3***@gmail.com |

INTRODUCTION TO SNOWFLAKE SQL


Introducing JSON
JavaScript Object Notation

Common use cases: Web APIs and Config files

JSON data structure:


Key-Value Pairs, e.g., cust_id: 1

INTRODUCTION TO SNOWFLAKE SQL


JSON in Snowflake
Native JSON support

Flexible for evolving schemas

Comparisons:

Postgres: Uses JSONB

Snowflake: Uses VARIANT

INTRODUCTION TO SNOWFLAKE SQL


How Snowflake stores JSON data
VARIANT supports OBJECT and ARRAY data types
OBJECT: { "key": "value"}

ARRAY: ["list", "of", "values"]

Creating a Snowflake Table to handle JSON data

CREATE TABLE cust_info_json_data (


customer_id INT,
customer_info VARIANT -- VARIANT data type
);

INTRODUCTION TO SNOWFLAKE SQL


Semi-structured data functions
PARSE_JSON
expr : JSON data in string format

Returns: VARIANT type, valid JSON


object

INTRODUCTION TO SNOWFLAKE SQL


PARSE_JSON
Example:

SELECT PARSE_JSON(
-- Enclosed in strings
'{
"cust_id": 1,
"cust_name": "cust1",
"cust_age": 40,
"cust_email":"cust1***@gmail.com"
}
'-- Enclosed in strings
) AS customer_info_json

INTRODUCTION TO SNOWFLAKE SQL


OBJECT_CONSTRUCT
OBJECT_CONSTRUCT
Syntax: OBJECT_CONSTRUCT( [<key1>, <value1> [, <keyN>, <valueN> ...]] )

Returns: JSON object

SELECT OBJECT_CONSTRUCT(
-- Comma separated values rather than : notation
'cust_id', 1,
'cust_name', 'cust1',
'cust_age', 40,
'cust_email', 'cust1***@gmail.com'
)

INTRODUCTION TO SNOWFLAKE SQL


Querying JSON data in Snowflake
Simple JSON

SELECT
customer_info:cust_age, -- Use colon to access cust_age from column
customer_info:cust_name,
customer_info:cust_email,
FROM
cust_info_json_data;

INTRODUCTION TO SNOWFLAKE SQL


Querying nested JSON Data in Snowflake
Example of nested JSON

Colon: :

Dot: .

INTRODUCTION TO SNOWFLAKE SQL


Querying nested JSON using colon/dot notations
Accessing values using colon notation Accessing values using dot notation

<column>:<level1_element>: <column>:<level1_element>.
<level2_element>:<level3_element> <level2_element>.<level3_element>

SELECT SELECT
customer_info:address:street AS street_name customer_info:address.street AS street_name
FROM FROM
cust_info_json_data cust_info_json_data

INTRODUCTION TO SNOWFLAKE SQL


Let's practice!
INTRODUCTION TO SNOWFLAKE SQL
Wrap-up
INTRODUCTION TO SNOWFLAKE SQL

George Boorman
Senior Curriculum Manager, DataCamp
Chapter 1: Snowflake SQL and key concepts
Connecting to Snowflake

WEB UI

Drivers & Connectors


Snowflake CLI

INTRODUCTION TO SNOWFLAKE SQL


Chapter 1: Snowflake SQL and key concepts
VARCHAR STRING functions: CONCAT , INITCAP

NUMBER
DATE & TIME functions: CURRENT_DATE ,
TIMESTAMP_LTZ CURRENT_TIME

Data Type conversion - What, Why, How? EXTRACT functions: GROUP BY ALL
Conversion Fucntions: TO_VARCHAR , TO_DATE

INTRODUCTION TO SNOWFLAKE SQL


Chapter 2: Advance Snowflake SQL Concepts
JOINS

NATURAL JOIN

LATERAL JOIN

Subquerying

CTEs

INTRODUCTION TO SNOWFLAKE SQL


Chapter 2: Advance Snowflake SQL Concepts
Snowflake Query Optimization

Common query problems: Exploding Joins, UNION vs UNION ALL

Rewriting queries: TOP , LIMIT , Early filtering, Avoid Select *`


Semi structured data

PARSE_JSON , OBJECT_CONSTRUCT

Querying JSON data in Snowflake

INTRODUCTION TO SNOWFLAKE SQL


Is this all?
Much more to unfold
Not addressed

Setting context
Roles, Users

Setting up Virtual Warehouses

Window functions

Query profiling

Materialized Views

Clustering
...

INTRODUCTION TO SNOWFLAKE SQL


Useful resources
Snowflake documentation: https://docs.snowflake.com/
Snowflake forums: https://community.snowflake.com/s/forum

Introduction to Data Modeling in Snowflake

Snowflake Tutorial

INTRODUCTION TO SNOWFLAKE SQL


This is just the
beginning!
INTRODUCTION TO SNOWFLAKE SQL

You might also like