===================== TODAY'S TOPIC
==============================================
1. What is a Data Model.
2. How to understand a data model with 200+ tables
3. How to spot the tables relavent to my requirements ( when there are 200+ tables
in my environment)
4. Types of Tables ( Transaction , Log, Error, Audit, Metadata, Stale/Outdated,
History, Temp )
5. Types of Columns ( PK, FK, Audit, Transaction, Stale/Outdated )
6. How to detect the Data Flow in an environment
7. How constriants will link the tables
8. What is a soft contraint ( Constraints which are not enforced )
9. How to contruct the FROM Clause , WHERE Clause , GROUP BY Clause
10. Impact of Columns choosed in SELECT clause
11. How to acheive the GROUP BY without reducing the number of records.
12. What is the differece between OLTP and OLAP
13. What is 3NF and Dimentional modelling
14. Why Dimentional models are named ( as Start Schema and Snowflake Schema) and
3NF is not.
15. Why we need to copy the OLTP data into DWH instead why cant we generate report
directly out of OLTP Systems.
=========== SQL BEST PRACTICE ==============================================
Avoid using string manipulation as much as possible
In case of aggregated requirements use aggregate functions ( PARTITION BY) as
much as
possible. Regular group by column will reduce the number of records and requires
writing
more complicated queries.
Avoid using function-based filters. In such case create a new column in the base
table for the
derived column
Avoid using date / timestamp columns in the join class
Avoid converting the date time stamp into numbers, truncating the timestamp back
and forth
In case of lengthy query, avoid reusing a table more than once
Filter the results in the inner most query, instead off brining to the outer
layer and then apply
filtering
Avoid outer joins
Handle the NULL values carefully
In most of the cases, the performance of the query is proportional to the size of
the query.
Smaller the query is, quicker the execution time is. Unless the table sizes are too
different.
The best way to tune an existing query is to re-arrange the inner-outer queries
in a proper
way.
===================== UPCOMING SIMILAR SINGLE SITTING FREE TRAINING
======================================
GUIDELINES TO SETUP AND LEARN THE BELOW.
Setup Mattilion/Streamsets to EXTRACT data from OLTP and populate the data into
Snowflake
Setup dbt ( an ELT tool ) and do transformation
Share a sample complex dbt project.