Introduction
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Michel Semaan
Data Scientist
Motivation
USA total and running total of Summer Discus throw reigning champion status
Olympics gold medals since 2004
| Year | Champion | Last_Champion | Reigning_Champion |
|------ |----------|--------------- |------------------- |
| Year | Medals | Medals_RT |
| 1996 | GER | null | false |
|------|--------|----------- |
| 2000 | LTU | GER | false |
| 2004 | 116 | 116 |
| 2004 | LTU | LTU | true |
| 2008 | 125 | 241 |
| 2008 | EST | LTU | false |
| 2012 | 147 | 388 |
| 2012 | GER | EST | false |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Course outline
1. Introduction to window functions
2. Fetching, ranking, and paging
3. Aggregate window functions and frames
4. Beyond window functions
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Summer olympics dataset
Each row represents a medal awarded in the Summer Olympics games
Columns
Year , City
Sport , Discipline , Event
Athlete , Country , Gender
Medal
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Window functions
Perform an operation across a set of rows that are somehow related to the current row
Similar to GROUP BY aggregate functions, but all rows remain in the output
Uses
Fetching values from preceding or following rows (e.g. fetching the previous row's value)
Determining reigning champion status
Calculating growth over time
Assigning ordinal ranks (1st, 2nd, etc.) to rows based on their values' positions in a sorted list
Running totals, moving averages
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Row numbers
Query Result
SELECT | Year | Event | Country |
Year, Event, Country |------ |---------------------------- |--------- |
FROM Summer_Medals | 1896 | 100M Freestyle | HUN |
WHERE | 1896 | 100M Freestyle For Sailors | GRE |
Medal = 'Gold'; | 1896 | 1200M Freestyle | HUN |
| ... | ... | ... |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Enter ROW_NUMBER
Query Result
SELECT | Year | Event | Country | Row_N |
Year, Event, Country, |------ |---------------------------- |---------|------- |
ROW_NUMBER() OVER () AS Row_N | 1896 | 100M Freestyle | HUN | 1 |
FROM Summer_Medals | 1896 | 100M Freestyle For Sailors | GRE | 2 |
WHERE | 1896 | 1200M Freestyle | HUN | 3 |
Medal = 'Gold'; | ... | ... | ... | ... |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Anatomy of a window function
Query
SELECT
Year, Event, Country,
ROW_NUMBER() OVER () AS Row_N
FROM Summer_Medals
WHERE
Medal = 'Gold';
FUNCTION_NAME() OVER (...)
ORDER BY
PARTITION BY
ROWS/RANGE PRECEDING/FOLLOWING/UNBOUNDED
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
ORDER BY
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Michel Semaan
Data Scientist
Row numbers
Query Result*
SELECT | Year | Event | Country | Row_N |
Year, Event, Country, |------ |---------------------------- |---------|------- |
ROW_NUMBER() OVER () AS Row_N | 1896 | 100M Freestyle | HUN | 1 |
FROM Summer_Medals | 1896 | 100M Freestyle For Sailors | GRE | 2 |
WHERE | 1896 | 1200M Freestyle | HUN | 3 |
Medal = 'Gold'; | ... | ... | ... | ... |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Enter ORDER BY
ORDER BY in OVER orders the rows related to the current row
Example: Ordering by year in descending order in ROW_NUMBER 's OVER clause will assign 1
to the most recent year's rows
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Ordering by Year in descending order
Query Result
SELECT | Year | Event | Country | Row_N |
Year, Event, Country, |------|--------------- |--------- |------- |
ROW_NUMBER() OVER (ORDER BY Year DESC) AS Row_N | 2012 | Wg 96 KG | IRI | 1 |
FROM Summer_Medals | 2012 | 4X100M Medley | USA | 2 |
WHERE | 2012 | Wg 84 KG | RUS | 3 |
Medal = 'Gold'; | ... | ... | ... | ... |
| 2008 | 50M Freestyle | BRA | 637 |
| 2008 | 96 - 120KG | CUB | 638 |
| ... | ... | ... | ... |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Ordering by multiple columns
Query Result
SELECT | Year | Event | Country | Row_N |
Year, Event, Country, |------|--------- |--------- |------- |
ROW_NUMBER() OVER | 2012 | + 100KG | FRA | 1 |
(ORDER BY Year DESC, Event ASC) AS Row_N | 2012 |+67KG | SRB | 2 |
FROM Summer_Medals | 2012 | + 78KG | CUB | 3 |
WHERE | ... | ... | ... | ... |
Medal = 'Gold';
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Ordering in- and outside OVER
Query Result
SELECT | Year | Event | Country | Row_N |
Year, Event, Country, |------|--------- |--------- |------- |
ROW_NUMBER() OVER | 2012 | 1500M | ALG | 36 |
(ORDER BY Year DESC, Event ASC) AS Row_N | 2000 | 1500M | ALG | 1998 |
FROM Summer_Medals | 1996 | 1500M | ALG | 2662 |
WHERE | ... | ... | ... | ... |
Medal = 'Gold'
ORDER BY Country ASC, Row_N ASC; ORDER BY inside OVER takes effect before
ORDER BY outside OVER
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Reigning champion
A reigning champion is a champion who's won both the previous and current years'
competitions
The previous and current year's champions need to be in the same row (in two different
columns)
Enter LAG
LAG(column, n) OVER (...) returns column 's value at the row n rows before the current row
LAG(column, 1) OVER (...) returns the previous row's value
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Current champions
Query Result
SELECT | Year | Champion |
Year, Country AS Champion |------|----------|
FROM Summer_Medals | 1996 | GER |
WHERE | 2000 | LTU |
Year IN (1996, 2000, 2004, 2008, 2012) | 2004 | LTU |
AND Gender = 'Men' AND Medal = 'Gold' | 2008 | EST |
AND Event = 'Discus Throw'; | 2012 | GER |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Current and last champions
Query Result
WITH Discus_Gold AS ( | Year | Champion | Last_Champion |
SELECT |------ |----------|--------------- |
Year, Country AS Champion | 1996 | GER | null |
FROM Summer_Medals | 2000 | LTU | GER |
WHERE | 2004 | LTU | LTU |
Year IN (1996, 2000, 2004, 2008, 2012) | 2008 | EST | LTU |
AND Gender = 'Men' AND Medal = 'Gold' | 2012 | GER | EST |
AND Event = 'Discus Throw')
SELECT
Year, Champion,
LAG(Champion, 1) OVER
(ORDER BY Year ASC) AS Last_Champion
FROM Discus_Gold
ORDER BY Year ASC;
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
PARTITION BY
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Michel Semaan
Data Scientist
Motivation
Query Result
WITH Discus_Gold AS ( | Year | Event | Champion | Last_Champion |
SELECT |------|-------------- |---------- |--------------- |
Year, Event, Country AS Champion | 2004 | Discus Throw | LTU | null |
FROM Summer_Medals | 2008 | Discus Throw | EST | LTU |
WHERE | 2012 | Discus Throw | GER | EST |
Year IN (2004, 2008, 2012) | 2004 | Triple Jump | SWE | GER |
AND Gender = 'Men' AND Medal = 'Gold' | 2008 | Triple Jump | POR | SWE |
AND Event IN ('Discus Throw', 'Triple Jump') | 2012 | Triple Jump | USA | POR |
AND Gender = 'Men')
SELECT
When Event changes from Discus Throw
Year, Event, Champion, to Triple Jump , LAG fetched
LAG(Champion) OVER
(ORDER BY Event ASC, Year ASC) AS Last_Champion
Discus Throw 's last champion as opposed
FROM Discus_Gold to a null
ORDER BY Event ASC, Year ASC;
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Enter PARTITION BY
PARTITION BY splits the table into partitions based on a column's unique values
The results aren't rolled into one column
Operated on separately by the window function
ROW_NUMBER will reset for each partition
LAG will only fetch a row's previous value if its previous row is in the same partition
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Partitioning by one column
Query Result
WITH Discus_Gold AS (...) | Year | Event | Champion | Last_Champion |
|------|-------------- |---------- |--------------- |
SELECT | 2004 | Discus Throw | LTU | null |
Year, Event, Champion, | 2008 | Discus Throw | EST | LTU |
LAG(Champion) OVER | 2012 | Discus Throw | GER | EST |
(PARTITION BY Event | 2004 | Triple Jump | SWE | null |
ORDER BY Event ASC, Year ASC) AS Last_Champion | 2008 | Triple Jump | POR | SWE |
FROM Discus_Gold | 2012 | Triple Jump | USA | POR |
ORDER BY Event ASC, Year ASC;
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
More complex partitioning
| Year | Country | Event | Row_N |
|------|--------- |---------------------- |------- |
| 2008 | CHN | + 78KG (Heavyweight) | 1 |
| 2008 | CHN |-49KG | 2 |
| ... | ... | ... | ... |
| 2008 | JPN | 48 - 55KG | 27 |
| 2008 | JPN | 48 - 55KG | 28 |
| ... | ... | ... | ... |
| 2012 | CHN | +75KG | 32 |
| 2012 | CHN |-49KG | 33 |
| ... | ... | ... | ... |
| 2012 | JPN | +75KG | 51 |
| 2012 | JPN |-49KG | 52 |
| ... | ... | ... | ... |
Row number should reset per Year and Country
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Partitioning by multiple columns
Query Result
WITH Country_Gold AS ( | Year | Country | Event | Row_N |
SELECT |------|--------- |---------------------- |------- |
DISTINCT Year, Country, Event | 2008 | CHN | + 78KG (Heavyweight) | 1 |
FROM Summer_Medals | 2008 | CHN |-49KG | 2 |
WHERE | ... | ... | ... | ... |
Year IN (2008, 2012) | 2008 | JPN | 48 - 55KG | 1 |
AND Country IN ('CHN', 'JPN') | 2008 | JPN | 48 - 55KG | 2 |
AND Gender = 'Women' AND Medal = 'Gold') | ... | ... | ... | ... |
| 2012 | CHN | +75KG | 1 |
SELECT | 2012 | CHN |-49KG | 2 |
Year, Country, Event, | ... | ... | ... | ... |
ROW_NUMBER() OVER (PARTITION BY Year, Country) | 2012 | JPN | +75KG | 1 |
FROM Country_Gold; | 2012 | JPN |-49KG | 2 |
| ... | ... | ... | ... |
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS
Let's practice!
POSTGRESQL SUMMARY STATS AND WINDOW FUNCTIONS