Introduction
• Snowflake was founded on 2012 in California
• It founded by Benoit and Thierry, Previously they are worked as Data Architects at Oracle Corporation
• The Snowflake Data Warehouse publicly launched on 2014
Benoit Dageville Thierry Cruanes
Data
• Data is an information of the object
• Data is one of the asset in the current world
• Without data no one can runs the business
• Data might be numbers, characters, symbols,
images, etc.,
ETL
• ETL means Extract, Transform and Load
• It Extract the data from the different sources and Transform the data according to the business
logic and Load it into another database
• ETL Tools:
• Informatica Power Centre
• Talend
• Oracle Data Integration (ODI)
• Data Stage
• SSIS
• Ab initio
• Pentaho
• Big Data
Database
• Database is a collection of information
• We should store our data any of the place, that place is called as a Database
• Without storing data in database we can not re-use the data again and again
• Databases:
• Oracle
• SQL Server
• DB2
• Teradata
• Mango DB
• Snowflake
• Ingres
• MySQL
Database Data Warehouse
• A database is a collection of data or • A data warehouse is a system that stores highly
information structured information from various sources
• Databases are Online Transaction Processing • Data warehouses are Online Analytical
(OLTP). Which means here we can do the day Processing (OLAP). Which means here we can
to day current data transactions. keep the years of historical data.
• Normalized architecture. Which is avoiding • Denormalized architecture. Which means
the data redundancy (Junk data, Duplicates.,) storing the very complex tables.
Data Warehouse Architecture
DB2 Reporting
Oracle Database Data Warehouse Visualization
ETL OLTP OLAP
CRM BI
Reporting
Source
Generations of Data Warehouses
Oracle SQL
1st Gen
MySQL
Teradata On-Premises
2nd Gen
Vertica
3rd Gen Big Data
4th Gen RedShift Platform-as-a-Service
5th Gen Snowflake Software-as-a-Service
Why Snowflake?
What is Snowflake?
• Snowflake is a cloud based data warehousing solution
• Snowflake offers data storage and analytics services.
• Snowflake does not have their own infrastructure.
• It runs on Amazon S3, Microsoft Azure, and the Google Cloud
platform.
• Snowflake runs completely on cloud infrastructure.
• Available as Software-as-a-Service.
Why Snowflake?
• Pay for what you use model.
• It is a cloud platform, no Infrastructure cost.
• Snowflake is more than a Datawarehouse.
• It also helps in some transformations, create data pipes, create visual
dashboard etc.
• High scalability.
• Data recovery, backup, sharing, masking.
• Can analyze the data present in external files.
• Easy integration with Data Visualization/Reporting tolls.
Traditional WH Vs Snowflake
Feature Traditional WH Snowflake
Infrastructure cost yes No Infrastructure cost
Handle semi structure data Need ETL tools Snowflake can process
Data loading and unloading Need ETL tools Can be done by using “COPY”
Scalability Not an easy task Highly Scalable
(support Scale-up and Scale-out)
Database Administration Highly Required In-built performance optimization with its
micro partitions and cluster keys
Traditional WH Vs Snowflake
Feature Traditional WH Snowflake
Data Backup Need additional storage Easy and no cost with “Cloning”
Data Recovery Difficult Very easy with “Time Travel”
Data Sharing Difficult Easy with Data Sharing feature
Change Data Capture Need ETL tools Can be done by using “Streams”
Scheduling Tools required Can schedule by using “Tasks”