Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views29 pages

An Introduction To Snowflake - SQLKonferenz

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views29 pages

An Introduction To Snowflake - SQLKonferenz

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

An introduction to Snowflake - the

data cloud
Johan Ludvig Brattås
Deloitte
Agenda
• A short history
• Overview
• Snowflake as a DB
• Integrations
• Snowpark
The cloud data warehouse
• Initially a response on challenges faced by
traditional RDBMS

• Massivelly Parallell Processing (MPP)

• Still a take on EDW


the cloud data platform
Can data lake functionality and EDW merge
somehow?

Suggestions for solving the issues:


• Logical data warehouse
• Cloud data warehouse
• Virtualization

Enter the new cloud data platforms


Definition of a cloud data platform
• No longer just your Dad-a-base…
• Storage supporting diverse data types
• Compute and tools supporting diverse
workloads
• Tooling for CI/CD, encryption, RBAC etc
• Data management tools
Snowflake
• Established in 2012
• Launched publicly in 2015
• Record IPO in 2020
• Unique architecture with fully separated
storage and compute
• Based on ANSI SQL
• Started as a data warehousing service
Snowflake vs Databricks
• Snowflake comes from EDW world
• Databricks from Spark data science and
python data engineering

• Converge as both have added new


features
Snowflake vs Databricks
• Handbags at dawn
The Snowflake Architecture
• The core Snowflake platform
• Storage
• Compute
• Cloud Services
• Snowgrid
Storage
• Databases for ACID + RDBMS
• Automated partitioning
• Time travel
• Autotuned
• Internal Stage for semi- & unstructured
• External stages to on-prem
& cloud
Storage
• Cloud stages support S3, GCS & ADLS
• On-prem only S3-compatible

• External stages support


• JSON/XML/CSV…
• Avro/Parquet…
• Apache Iceberg
• Delta Lake
Storage
• Create External Tables
• Build materialized views on
semi-structured data
Compute
• Called warehouses
• Elastic
• From XS -> 6XL
• 2 types
• Normal
• Snowpark (memory) optimized
• Auto-pause + instant restart
Compute
• Consists of CPU & RAM
• Cache
• Separate warehouses per
usecase

• Be mindful of auto-pause =
cache emptied
• Plan your usecase usage patterns
Cloud Services
• The central administration and
control layer
• 4 pillars
• Maintenance & tuning
• Administration
• Networking & Encryption
• Resource Manager
Cloud Services – 4 pillars
• Maintenance & tuning
• Administration
• Networking & Encryption
• Resource Manager
Cloud Services – 4 pillars
• Maintenance & tuning
• Common meta-data repository
• Snowflake is “DBA-free”
• Auto-tuning of queries
• Auto-partitioning
• Auto-indexing/”Indexfree”
Cloud Services – 4 pillars
• Administration
• Transaction manager
• Security/RBAC
• Authentication & Authorization
• Networking & Encryption
• Intra-cluster
• Cloud connectivity
• Resource Manager
• Cluster management
The Snowflake Architecture
• Snowgrid
• Global Snowflake internal
network
• Cloud Agnostic
Integrations
• Integration
• Stages
• External Tables
• Dynamic Tables
• Snowpipes
• Unistore
The Snowflake Architecture
• Snowpark
• Streamlit
Snowpark
• Expands Snowflake from traditional RDBMS
• Python – offers traditional dataframe APIs
• Also ML modelling and opreations APIs

• Can run inside warehouses


• Can run on containers (Snowpark Container Services)
Streamlit
• Company aqcuired by Snowflake 2022
• Build interactive apps with Python that runs on Snowflake
• Web apps, widgets – with unique URLs that can be shared
• Still in public preview
The Snowflake Marketplace
• From the consumer
• Search, discover and sample datasets globally
• Access datasets –
some free, some commercial
• No need to run ETL processes to fetch data
• Directly start querying the data inside own
account
• Can combine internal and marketplace data
The Snowflake Marketplace
• From the producer
• Share data with users outside your
organization
• This done through listings
• Listings can be global or limited to select
users/organizations
• Datasets can be a one-off, an update or
stream.
• No special development needed
• Listings can be private, free or paid
THANK YOU TO OUR SPONSORS
Chronic volunteer
Co-organizer – DataSaturday Oslo
President – MDPUG Oslo
Frequent voulenteer in general

When not geeking out over new tech

Johan Ludvig Brattås Teaching coeliacs how to bake gluten free


Baking
Hiking
Gardening
Director, Deloitte

/johanludvig

@intoleranse

[email protected]
Thank you very much for your attention.
Vielen Dank für Eure Aufmerksamkeit.

You might also like