Snowflake: Definitions & Key Concepts
1. Snowflake
Snowflake is a cloud-based data warehousing platform that provides scalable storage, high-
performance computing, and advanced data analytics. It enables businesses to store, process, and
analyse structured and semi-structured data efficiently across multiple cloud providers (AWS, Azure,
and Google Cloud).
2. Snowflake Architecture
Multi-Cluster Shared Data Architecture: Snowflake separates storage, compute, and cloud
services to enhance scalability and performance.
Three Layers:
1. Storage Layer – Stores structured and semi-structured data in a compressed,
optimized format.
2. Compute Layer – Virtual Warehouses execute queries independently.
3. Cloud Services Layer – Manages authentication, security, metadata, and query
optimization.
3. Virtual Warehouse
A Virtual Warehouse in Snowflake is a compute engine responsible for processing queries and
performing data operations. It provides on-demand scalability and can be suspended when not in
use to save costs.
4. Time Travel
Time Travel in Snowflake allows users to access historical data for up to 90 days. It helps in
recovering deleted or modified data.
Uses the AT and BEFORE keywords:
SELECT * FROM orders AT (TIMESTAMP => '2025-02-28 12:00:00');
5. Zero-Copy Cloning
Zero-Copy Cloning allows users to create copies of tables, schemas, or databases without
duplicating the data. It enables instant cloning while saving storage costs.
CREATE TABLE orders_clone CLONE orders;
6. Snowflake Stages
A stage in Snowflake is a location where data is temporarily stored before loading it into tables.
Internal Stages: Managed by Snowflake.
External Stages: Connects to cloud storage (AWS S3, Azure Blob, GCS).
7. Data Sharing
Data Sharing in Snowflake enables secure, real-time data sharing between different Snowflake
accounts without data movement.
8. Clustering Key
A Clustering Key is a column or set of columns used to organize data storage for faster query
performance.
9. Materialized Views
A Materialized View is a precomputed, stored result of a query that improves performance by
avoiding frequent recalculations.
10. Query Caching
Snowflake uses three levels of caching to speed up queries:
1. Result Cache – Stores query results for 24 hours.
2. Local Disk Cache – Stores temporary results within a Virtual Warehouse.
3. Remote Disk Cache – Saves intermediate data for performance optimization.
11. Role-Based Access Control (RBAC)
RBAC is a security model that assigns roles to users and grants permissions based on those roles.
CREATE ROLE analyst;
GRANT SELECT ON TABLE sales TO ROLE analyst;
12. Snowflake Data Types
String: VARCHAR, CHAR, TEXT
Numeric: INTEGER, FLOAT, NUMBER
Date & Time: DATE, TIMESTAMP, TIME
Boolean: BOOLEAN