0% found this document useful (0 votes)

480 views93 pages

Snowflake Data Engineering Concepts

The document provides a comprehensive overview of Snowflake, a cloud-based data warehousing platform, detailing its architecture, features, and functionalities. Key concepts include virtual warehouses, data management, query optimization, and security measures such as role-based access control and data masking. It also covers advanced features like Snowpipe for continuous data ingestion, data sharing capabilities, and the Snowflake Data Marketplace for accessing external data sets.

Uploaded by

krishnapmishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

480 views93 pages

Snowflake Data Engineering Concepts

Uploaded by

krishnapmishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 93

Data Engineering

SNOWFLAKE
ALL CONCEPTS TO GET STARTED
Data Engineering 101- Snowflake

Cloud Data
Warehouse
A cloud-based platform for storing and
analyzing data, which offers scalability,
flexibility, and cost-efficiency compared to
traditional on-premises data warehouses.

Snowflake provides a fully managed service

with separate compute, storage, and cloud
services layers, making it easier to scale and
manage data operations.

2
Data Engineering 101- Snowflake

Snowflake
Architecture
Snowflake's architecture separates storage
and compute, allowing for independent
scaling and efficient data management.
This design eliminates many limitations of
traditional data warehouses.

Snowflake uses a multi-cluster shared data

architecture, where storage is centralized,
and compute resources can be scaled up or
down independently based on workload.

3
Data Engineering 101- Snowflake

Virtual Warehouse
A virtual warehouse is a cluster of compute
resources in Snowflake. Each virtual
warehouse can be scaled independently to
match the workload, providing the
necessary compute power for query
execution without affecting other
warehouses.

If a company needs to run a heavy

analytical query during peak business
hours, they can scale up the virtual
warehouse to a larger size, ensuring faster
query performance. After the peak hours,
the warehouse can be scaled down to save
costs.

4
Data Engineering 101- Snowflake

Database

A logical grouping of schemas, tables, and

other database objects. It provides a
namespace for organizing and managing
data.

Creating a new database in Snowflake:

CREATE DATABASE sales_data;.
This command sets up a new database
where all sales-related schemas and tables
can be organized.

5
Data Engineering 101- Snowflake

Schema

A logical grouping of database objects such

as tables, views, and stored procedures.
Schemas help organize objects within a
database.

Creating a new schema in a database:

CREATE SCHEMA sales_data.january;.
This schema can contain all tables related
to January's sales data.

6
Data Engineering 101- Snowflake

Table

A structured set of data elements (values)

organized in rows and columns. Tables are
fundamental storage objects in a database.

Creating a new table:

CREATE TABLE customers
(id INT, name STRING, email STRING);.

This table stores customer information.

7
Data Engineering 101- Snowflake

View

A virtual table based on the result-set of a

SQL query. Views do not store data
themselves but provide a way to represent
data stored in tables.

Creating a new view:

CREATE VIEW vip_customers
AS SELECT *FROM customers
WHERE status ='VIP';.
This view shows only VIP customers.

8
Data Engineering 101- Snowflake

Stage

A location where data files are stored

temporarily before being loaded into
Snowflake tables. Stages can be internal or
external (e.g., S3, Azure Blob Storage).

Creating an internal stage:

CREATE STAGE my_stage;.
This stage can be used to store data files
before loading them into tables.

9
Data Engineering 101- Snowflake

File Format

Defines the format of data files to be loaded

into Snowflake (e.g., CSV, JSON, Avro). File
formats specify how Snowflake should
interpret the contents of the files.

Creating a file format for CSV files:

CREATE FILE FORMAT my_csv_format
TYPE ='CSV'
FIELD_OPTIONALLY_ENCLOSED_BY ='"';.

10
Data Engineering 101- Snowflake

Warehouse Size

Snowflake offers different sizes for virtual

warehouses (e.g., X-Small, Small, Medium,
Large) to accommodate various workloads.
Larger sizes provide more compute
resources.

A Small warehouse might be sufficient for

routine queries, while a Large warehouse
can handle complex analytical queries.
Adjust the size based on workload
demands.

11
Data Engineering 101- Snowflake

Scaling Up

Increasing the size of a virtual warehouse to

provide more compute resources for a
specific workload.

Scaling up a virtual warehouse:

ALTER WAREHOUSE my_warehouse
SET WAREHOUSE_SIZE ='LARGE';.
This increases the compute power available
for queries.

12
Data Engineering 101- Snowflake

Scaling Out

Adding more compute clusters to a virtual

warehouse to handle increased
concurrency and workload demands.

Enabling auto-scaling for a warehouse:

ALTER WAREHOUSE my_warehouse
SET MAX_CLUSTER_COUNT =5;.
Snowflake will add clusters as needed to
handle concurrent queries.

13
Data Engineering 101- Snowflake

Auto-Suspend

Automatically suspends a virtual warehouse

when it is idle for a specified period, saving
costs.

Setting auto-suspend for a warehouse:

ALTER WAREHOUSE my_warehouse
SET AUTO_SUSPEND =300;.
The warehouse will suspend after 5 minutes
of inactivity.

14
Data Engineering 101- Snowflake

Auto-Resume

Automatically resumes a suspended virtual

warehouse when a query is submitted,
ensuring availability without manual
intervention.

Enabling auto-resume for a warehouse:

ALTER WAREHOUSE my_warehouse
SET AUTO_RESUME =TRUE;.
The warehouse will resume automatically
when a query is submitted.

15
Data Engineering 101- Snowflake

Query Caching

Snowflake caches the results of queries to

speed up repeated query executions,
reducing the need for re-computation and
saving compute resources.

Running the same query twice will utilize

the cached result if the underlying data has
not changed, improving performance and
efficiency.

16
Data Engineering 101- Snowflake

Result Cache

Stores the results of queries executed

within the past 24 hours. The cache is
accessible to all users within the account,
reducing compute costs and speeding up
query performance.

If a query is run and then re-run within 24

hours without changes to the underlying
data, the result is fetched from the result
cache, saving compute resources.

17
Data Engineering 101- Snowflake

Metadata Cache

Stores metadata about database objects to

speed up query parsing and planning. This
cache helps optimize query execution by
reducing the time needed to access
metadata.

Metadata about tables, columns, and

statistics is cached, allowing faster query
planning and execution. This helps
Snowflake optimize performance for
complex queries.

18
Data Engineering 101- Snowflake

Data Caching

Snowflake caches data in the local storage

of virtual warehouses to improve query
performance. This cache is independent for
each virtual warehouse.

Frequently accessed data is stored in the

local disk cache of a virtual warehouse,
reducing the need to fetch data from
remote storage repeatedly, thus improving
performance.

19
Data Engineering 101- Snowflake

Stages

Locations in Snowflake where data files can

be stored before being loaded into tables.
Stages can be internal (within Snowflake) or
external (e.g., AWS S3).

An internal stage can be created using

CREATE STAGE my_stage;.
Data files can be uploaded to this stage
before being loaded into a table.

20
Data Engineering 101- Snowflake

COPY INTO
Command
Used to load data from a stage into a
Snowflake table. The command specifies
the target table and the source file(s) along
with optional transformations.

Loading data from a stage into a table:

COPY INTO my_table
FROM @my_stage/file.csv
FILE_FORMAT =(FORMAT_NAME =
'my_csv_format');.

21
Data Engineering 101- Snowflake

Time Travel

Allows users to query, clone, or restore data

to a previous state within a defined
retention period. This feature aids in data
recovery and auditing.

Querying a table as it was at a specific point

in time:
SELECT *FROM my_table
AT (TIMESTAMP =>'2022-06-01T00:00:00');.

22
Data Engineering 101- Snowflake

Zero-Copy Cloning

Enables creating a clone of a database,

schema, or table without copying the data.
Changes to the clone do not affect the
original, and vice versa.

Creating a clone of a table:

CREATE CLONE my_table_clone
OF my_table;.
This allows working with a snapshot of the
data without additional storage costs.

23
Data Engineering 101- Snowflake

Secure Data Sharing

Allows sharing of data across different

Snowflake accounts without moving or
copying the data. Consumers can query
shared data in real-time.

Sharing data with another Snowflake

account:
CREATE SHARE my_share;
GRANT SELECT ON TABLE my_table TO
SHARE my_share;.
The recipient can access the shared data
directly.

24
Data Engineering 101- Snowflake

Snowsight
Snowflake's new web user interface that enhances the user experience
with features like integrated dashboards, interactive visualizations, and
an improved SQL editor.

Users can create and manage interactive dashboards within

Snowsight, allowing them to visualize data trends and share insights
with their team. For example, a sales team can use Snowsight to
build a dashboard that tracks monthly sales performance across
different regions.

25
Data Engineering 101- Snowflake

Snowflake
Community
A vibrant network of users, experts, and
partners who share knowledge, best
practices, and support each other in using
Snowflake. It includes user groups, forums,
and special interest groups.

Joining the Snowflake Community allows

users to participate in discussions, attend
meetups, and access valuable resources.
For instance, a data analyst can join a virtual
special interest group focused on data
warehousing to learn from others'
experiences and share their own insights.

26
Data Engineering 101- Snowflake

Data Marketplace
The Snowflake Data Marketplace is a
platform where users can discover, access,
and share live data sets from various
providers. It facilitates data collaboration
and allows users to enrich their own data
with external data sources.

A marketing team can access demographic

data from a third-party provider through
the Data Marketplace to enhance their
customer analysis. They can integrate this
data with their internal sales data to gain
deeper insights into customer behavior and
preferences.

27
Data Engineering 101- Snowflake

Multi-Cluster
Warehouses
Multi-Cluster Warehouses allow Snowflake
to automatically manage the number of
compute clusters needed to handle varying
workloads. This ensures optimal
performance and resource utilization
without manual intervention.
A retail company can set up a multi-cluster
warehouse to handle the high concurrency
of queries during Black Friday sales.
Snowflake automatically adds clusters to
manage the increased load and removes
them when the load decreases, ensuring
efficient use of resources and cost
management.

28
Data Engineering 101- Snowflake

Materialized Views

Materialized views store the result set of a

query physically and automatically update
when the underlying data changes. They
improve query performance by providing
pre-computed results.

Creating a materialized view:

CREATE MATERIALIZED VIEW mv_sales
AS SELECT *FROM sales
WHERE year =2022;.
Queries on this view are faster since the
results are pre-computed.

29
Data Engineering 101- Snowflake

Task

Tasks are used to automate the execution

of SQL statements, including procedural
logic, at specified intervals or upon
completion of other tasks.

Creating a task to run a query every hour:

CREATE TASK hourly_task
WAREHOUSE ='my_warehouse'
SCHEDULE ='1 HOUR'
AS I
NSERT INTO daily_sales
SELECT *FROM sales
WHERE sales_date =CURRENT_DATE;.

30
Data Engineering 101- Snowflake

Stream

Streams track changes to a table (inserts,

updates, deletes) and provide a change
data capture (CDC) mechanism for efficient
data processing.

Creating a stream:
CREATE STREAM sales_stream ON TABLE
sales;.
The stream captures changes to the sales
table, which can be processed later.

31
Data Engineering 101- Snowflake

Pipe

Pipes automate data loading by

continuously ingesting data from external
stages (e.g., AWS S3, Azure Blob Storage)
into Snowflake tables.

Creating a pipe to load data from an S3

bucket:
CREATE PIPE my_pipe
AS COPY INTO my_table
FROM @my_stage/file.csv
FILE_FORMAT =(FORMAT_NAME =
'my_csv_format');.

32
Data Engineering 101- Snowflake

Warehouse
Monitoring
Snowflake provides tools to monitor the
performance and usage of virtual
warehouses, helping users optimize
resource allocation and manage costs.

Using the WAREHOUSE_METERING_HISTORY

view to monitor warehouse usage and costs:
SELECT *FROM
WAREHOUSE_METERING_HISTORY
WHERE WAREHOUSE_NAME =
'my_warehouse';.

33
Data Engineering 101- Snowflake

Role-Based Access
Control (RBAC)
A security model that restricts access to
data and resources based on the roles
assigned to users. Snowflake allows fine-
grained control over access permissions.

Creating a role and granting privileges:

CREATE ROLE analyst_role;
GRANT SELECT ON DATABASE sales_data TO
ROLE analyst_role;.

Assigning the role to a user:

GRANT ROLE analyst_role TO USER john_doe;.

34
Data Engineering 101- Snowflake

Dynamic Data
Masking
Dynamic Data Masking allows Snowflake to
hide sensitive data in query results based
on the role of the user accessing the data.
This enhances data security and privacy.

Masking sensitive data:

CREATE MASKING POLICY ssn_mask
AS (val STRING)
RETURNS STRING ->CASE
WHEN CURRENT_ROLE() IN ('analyst_role')
THEN 'XXX-XX-XXXX'
ELSE val END; Applying the policy:
ALTER TABLE customers
MODIFY COLUMN ssn
SET MASKING POLICY ssn_mask;.

35
Data Engineering 101- Snowflake

External Tables

External tables allow Snowflake to query

data stored in external locations (e.g., AWS
S3, Azure Blob Storage) without loading it
into Snowflake.

Creating an external table:

CREATE EXTERNAL TABLE my_ext_table WITH
LOCATION ='@my_external_stage'
FILE_FORMAT =(FORMAT_NAME =
'my_csv_format');.
This table allows querying data directly
from the external stage.

36
Data Engineering 101- Snowflake

Data Replication

Snowflake's data replication feature allows

for the replication of databases across
different regions and cloud providers to
enhance data availability and disaster
recovery.

Setting up data replication:

CREATE REPLICATION GROUP
my_replication AS REPLICATION
TO REGION 'aws_us_west_2';.
This replicates the database to a different
AWS region.

37
Data Engineering 101- Snowflake

Failover and
Failback
Snowflake provides failover and failback
capabilities to ensure high availability and
disaster recovery. Failover allows switching
to a replica in case of a failure, and failback
switches back once the original is restored.

Configuring failover for a database:

ALTER DATABASE my_database
SET FAILOVER GROUP =my_failover_group;.
This ensures that the database can switch
to a replica in case of a failure.

38
Data Engineering 101- Snowflake

Search Optimization
Service
A Snowflake feature that improves the
performance of searches on large tables by
creating and maintaining search
optimization structures.

Enabling search optimization for a table:

ALTER TABLE my_table SET SEARCH
OPTIMIZATION =TRUE;.
This improves the performance of search
queries on the table.

39
Data Engineering 101- Snowflake

Snowflake Data
Exchange
A platform that allows Snowflake users to
share and access live data securely. It
facilitates data collaboration and
monetization by providing a marketplace
for data providers and consumers.

Publishing data to the Data Exchange:

CREATE EXCHANGE my_exchange;
GRANT SELECT ON TABLE my_table TO
EXCHANGE my_exchange;.
Other users can subscribe to and query the
shared data.

40
Data Engineering 101- Snowflake

Data Masking
Data masking provides a way to protect
sensitive data by masking it in query results,
based on user roles. This ensures that
sensitive information is not exposed to
unauthorized users.
Creating a data masking policy:
CREATE MASKING POLICY email_mask
AS (val STRING)
RETURNS STRING ->CASE
WHEN CURRENT_ROLE() IN ('analyst_role')
THEN '********@domain.com' ELSE val END;
Applying the policy to a column:
ALTER TABLE users
MODIFY COLUMN email SET MASKING POLICY
email_mask;.

41
Data Engineering 101- Snowflake

Snowpipe

Snowpipe is Snowflake's continuous data

ingestion service, which allows for the
automated loading of data from external
stages into Snowflake tables.

Creating a Snowpipe to load data:

CREATE PIPE my_pipe
AS COPY INTO my_table
FROM @my_stage
FILE_FORMAT =(FORMAT_NAME =
'my_csv_format');.
Snowpipe will automatically load new data
files as they arrive in the stage.

42
Data Engineering 101- Snowflake

External Functions

External functions allow Snowflake to call

external services and integrate with
external systems directly from SQL queries.
This enables advanced data processing and
integration capabilities.

Creating an external function:

CREATE EXTERNAL FUNCTION
my_ext_function()
RETURNS STRING API_INTEGRATION =
my_api_integration;.
This function can call an external API and
return the result to Snowflake.

43
Data Engineering 101- Snowflake

Streams and Tasks

Streams track changes to tables, and tasks

automate the execution of SQL based on
schedules or events. Together, they enable
efficient change data capture and
automation.

Creating a stream and task:

CREATE STREAM my_stream
ON TABLE my_table;
CREATE TASK my_task WAREHOUSE =
'my_warehouse' SCHEDULE ='1 HOUR'
AS
INSERT INTO my_target_table
SELECT *FROM my_stream;.

44
Data Engineering 101- Snowflake

Snowflake
Organizations
Snowflake Organizations provide a way to
manage multiple Snowflake accounts
within an organization. This enables better
resource allocation, cost management, and
governance.

Creating an organization:
CREATE ORGANIZATION my_org;
and adding accounts to it. This allows
central management of multiple Snowflake
accounts.

45
Data Engineering 101- Snowflake

Data Governance

Snowflake offers features for data

governance, including access controls, data
masking, and audit logging, to ensure data
security, privacy, and compliance.

Implementing data governance:

CREATE ROW ACCESS POLICY my_policy AS
(val STRING)
RETURNS BOOLEAN ->CURRENT_ROLE()
IN ('data_governance_role');
and applying it to a table.

46
Data Engineering 101- Snowflake

Account Usage

Snowflake provides account usage views to

track and analyze resource usage, query
performance, and cost management. These
views help in monitoring and optimizing
Snowflake usage.

Querying account usage:

SELECT *FROM
ACCOUNT_USAGE.QUERY_HISTORY WHERE
QUERY_TEXT ILIKE '%SELECT%';
This retrieves the history of SELECT queries
executed in the account.

47
Data Engineering 101- Snowflake

Resource Monitors

Resource monitors allow administrators to

manage and control compute resource
usage by setting thresholds and triggering
actions when limits are reached.

Creating a resource monitor:

CREATE RESOURCE MONITOR my_monitor
WITH CREDIT_QUOTA =1000;
and assigning it to a warehouse. This
monitor will track the compute credits used
by the warehouse and take action if the
quota is exceeded.

48
Data Engineering 101- Snowflake

Query Optimization
Snowflake provides various tools and techniques to optimize query
performance, including using the Query Profiler, optimizing table
structures, and leveraging caching.

Using the Query Profiler:

SELECT *FROM
TABLE(QUERY_HISTORY_BY_SESSION(SESSI
ON_ID =>'my_session'));
This helps identify and optimize slow- running queries.

49
Data Engineering 101- Snowflake

Data Sharing

Snowflake allows secure sharing of data

between different accounts without data
movement. Shared data can be accessed in
real-time, ensuring consistency and
reducing latency.

Creating a share: CREATE SHARE my_share;

and adding tables to it. Other Snowflake
accounts can access the shared data
directly.

50
Data Engineering 101- Snowflake

Cloning

Cloning in Snowflake creates a copy of a

database, schema, or table without
duplicating the data. This is useful for
creating test environments and for backup
purposes.

Cloning a table:
CREATE CLONE my_table_clone OF
my_table;
This allows working with a snapshot of the
data without additional storage costs.

51
Data Engineering 101- Snowflake

Data Load and

Unload
Snowflake provides various methods for
loading and unloading data, including bulk
loading with the COPY command, using
Snowpipe for continuous loading, and
unloading data to external stages.

Loading data:
COPY INTO my_table
FROM @my_stage FILE_FORMAT =
(FORMAT_NAME ='my_csv_format');
and unloading data:
COPY INTO @my_stage FROM my_table;.

52
Data Engineering 101- Snowflake

Data Encryption

Snowflake encrypts data at rest and in

transit to ensure data security. Encryption
keys are managed automatically, and users
can also provide their own keys for
additional security.

Enabling encryption for a table:

ALTER TABLE my_table SET
DATA_RETENTION_TIME_IN_DAYS =90;
This ensures that data is encrypted and
retained for a specified period.

53
Data Engineering 101- Snowflake

Data Retention

Snowflake provides data retention policies

to manage how long data is kept in the
system. This includes Time Travel and Fail-
safe periods for data recovery.

Setting data retention:

ALTER TABLE my_table SET
DATA_RETENTION_TIME_IN_DAYS =7;
This configures the table to retain historical
data for 7 days.

54
Data Engineering 101- Snowflake

Fail-Safe

Fail-Safe is a Snowflake feature that

provides an additional 7-day period for
recovering data after the Time Travel
retention period has expired. This ensures
data recovery in case of failures.

Accessing Fail-Safe data:

SELECT *FROM my_table BEFORE
(END_TIME =>'2022-06-01T00:00:00');
This retrieves data that is in the Fail-Safe
period.

55
Data Engineering 101- Snowflake

User-Defined
Functions (UDFs)
UDFs allow users to define their own
functions in SQL or JavaScript, extending
Snowflake's built-in functionality with
custom logic.

Creating a SQL UDF:

CREATE FUNCTION my_udf(x INT)
RETURNS INT
LANGUAGE SQL
AS
'RETURN x *2';
This function multiplies the input by 2.

56
Data Engineering 101- Snowflake

Stored Procedures
Stored procedures in Snowflake allow for procedural logic and complex
operations to be encapsulated in SQL or JavaScript, enabling
automation and reusable code.

Creating a stored procedure:

CREATE PROCEDURE my_proc()
RETURNS STRING LANGUAGE JAVASCRIPT
AS $$ return 'Hello, World!';
$$;
and calling it:
CALL my_proc();.

57
Data Engineering 101- Snowflake

Privileges and
Grants
Snowflake's security model uses privileges
and grants to control access to database
objects. Roles are assigned privileges, and
users are assigned roles.

Granting privileges:
GRANT SELECT ON TABLE my_table
TO ROLE analyst_role;
This allows users with the analyst_role to
query the table.

58
Data Engineering 101- Snowflake

Roles and Role

Hierarchies
Roles in Snowflake define a set of privileges
and can be assigned to users. Role
hierarchies allow roles to inherit privileges
from other roles, simplifying access
management.

Creating a role hierarchy:

CREATE ROLE senior_analyst;
GRANT ROLE analyst_role TO ROLE
senior_analyst;
Users with the senior_analyst role inherit
privileges from the analyst_role.

59
Data Engineering 101- Snowflake

Session Variables

Session variables in Snowflake store values

that can be used within a session. They
allow for dynamic SQL and reusable code.

Setting and using a session variable:

SET my_var ='Hello, World!';
and SELECT $my_var;
This returns the value of the variable.

60
Data Engineering 101- Snowflake

Parameter
Management
Snowflake allows configuration of various
parameters at the account, session, and
object levels to customize behavior and
optimize performance.

Setting a session parameter:

ALTER SESSION SET QUERY_TAG =
'MyQuery';
This tags queries within the session for
easier tracking.

61
Data Engineering 101- Snowflake

Semi-Structured
Data
Snowflake supports semi-structured data
formats such as JSON, Avro, Parquet, and
XML. This allows for flexible data modeling
and integration with modern data sources.

Querying JSON data: SELECT json_data:id

FROM my_table;. This retrieves the "id" field
from JSON data stored in a column.

62
Data Engineering 101- Snowflake

Data Compression

Snowflake automatically compresses data

to reduce storage costs and improve query
performance. Different compression
algorithms are used based on the data type.

Snowflake's automatic compression means

users don't need to manually configure
compression settings, as the platform
optimizes storage efficiency.

63
Data Engineering 101- Snowflake

Cost Management

Snowflake provides tools and practices to

manage and optimize costs, including
resource monitors, usage views, and best
practices for query optimization.

Using resource monitors to control costs:

CREATE RESOURCE MONITOR my_monitor
WITH CREDIT_QUOTA =1000;
and setting up alerts for budget thresholds.

64
Data Engineering 101- Snowflake

Query History

Snowflake tracks query history, allowing

users to review and analyze past queries for
performance optimization and
troubleshooting.

Accessing query history:

SELECT *FROM QUERY_HISTORY
WHERE QUERY_TEXT ILIKE '%SELECT%'; This
retrieves a history of SELECT queries
executed in the account.

65
Data Engineering 101- Snowflake

Metadata
Management
Snowflake manages metadata for all
database objects, providing detailed
information about tables, columns, and
other objects. This metadata is used for
query optimization and data governance.

Querying metadata:
SELECT *FROM
INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA ='PUBLIC';
This retrieves information about all tables in
the PUBLIC schema.

66
Data Engineering 101- Snowflake

Data Import/Export

Snowflake supports various methods for

importing and exporting data, including
bulk loading with the COPY command and
unloading to external stages.

Importing data:
COPY INTO my_table
FROM @my_stage FILE_FORMAT =
(FORMAT_NAME ='my_csv_format');
and exporting data:
COPY INTO @my_stage FROM my_table;

67
Data Engineering 101- Snowflake

Data Quality

Snowflake provides features to ensure data

quality, including constraints, data
validation, and profiling.

Implementing data quality checks:

CREATE TABLE my_table (
id INT PRIMARY KEY,
name STRING NOT NULL);
This ensures that the "id" column is unique
and the "name" column is not null.

68
Data Engineering 101- Snowflake

Data Lineage
Data lineage tracks the flow of data
through Snowflake, from ingestion to
transformation to analysis, providing
visibility into data dependencies and
transformations.

Using views and tasks to track data lineage:

CREATE VIEW my_view
AS
SELECT *FROM my_table;
and
CREATE TASK my_task
AS
INSERT INTO my_table
SELECT *FROM my_view;

69
Data Engineering 101- Snowflake

Business Continuity

Snowflake's features for business continuity

include data replication, failover, and fail-
safe, ensuring that data is always available
and recoverable in case of disasters.

Setting up a failover group:

CREATE FAILOVER GROUP my_group
AS FAILOVER TO REGION 'aws_us_west_2';
This ensures that the database can switch
to a replica in case of a failure.

70
Data Engineering 101- Snowflake

Governance and
Compliance
Snowflake provides tools for data
governance and compliance, including
access controls, data masking, and audit
logging, to ensure data security and
regulatory compliance.

Implementing compliance policies:

CREATE ROW ACCESS POLICY
compliance_policy AS (val STRING)
RETURNS BOOLEAN ->CURRENT_ROLE()
IN ('compliance_role');
and applying it to a table.

71
Data Engineering 101- Snowflake

Advanced Analytics

Snowflake supports advanced analytics

capabilities, including machine learning
integration, geospatial data processing, and
complex data transformations.

Integrating with machine learning models:

CREATE FUNCTION predict_sales(x FLOAT)
RETURNS FLOAT
LANGUAGE PYTHON RUNTIME ='3.8'
HANDLER ='my_model.predict';
This function calls a Python model for sales
prediction.

72
Data Engineering 101- Snowflake

Data Monetization

Snowflake's data marketplace and secure

data sharing enable organizations to
monetize their data assets by sharing or
selling data to other Snowflake users.

Publishing data for monetization:

CREATE EXCHANGE my_exchange;
and adding data to it for other users to
access and purchase.

73
Data Engineering 101- Snowflake

Geospatial Data

Snowflake supports geospatial data types

and functions, allowing users to store,
query, and analyze spatial data such as
points, polygons, and geometries.

Querying geospatial data:

SELECT ST_DISTANCE(point1, point2)
FROM my_table;
This calculates the distance between two
points stored in a table.

74
Data Engineering 101- Snowflake

IoT Data Processing

Snowflake's scalable architecture and

support for semi-structured data make it
well-suited for processing and analyzing IoT
(Internet of Things) data.

Loading IoT data:

COPY INTO my_table
FROM @iot_stage
FILE_FORMAT =(FORMAT_NAME =
'json_format');
This ingests JSON data from IoT devices.

75
Data Engineering 101- Snowflake

Real-Time Analytics

Snowflake supports real-time analytics by

allowing continuous data ingestion and
immediate querying of fresh data.

Using Snowpipe for real-time data

ingestion:
CREATE PIPE my_pipe
AS COPY INTO my_table
FROM @my_stage
FILE_FORMAT =(FORMAT_NAME =
'my_csv_format');

76
Data Engineering 101- Snowflake

Data Federation
Snowflake's external tables and data sharing features enable data
federation, allowing users to query and combine data from multiple
sources without moving the data.

Creating an external table to federate data: CREATE

EXTERNAL TABLE my_ext_table WITH LOCATION
='@my_external_stage' FILE_FORMAT =
(FORMAT_NAME =
'my_csv_format');
This table allows querying data directly from the external stage.

77
Data Engineering 101- Snowflake

Security
Integrations
Snowflake integrates with security tools
and frameworks, including single sign-on
(SSO), multi-factor authentication (MFA),
and encryption key management, to
enhance data security.

Configuring SSO:
ALTER ACCOUNT SET SSO_LOGIN_PAGE =
'https://mycompany.com/sso';.
This enables single sign-on for Snowflake
users.

78
Data Engineering 101- Snowflake

Continuous Data
Protection
Snowflake's continuous data protection
features include Time Travel, Fail-safe, and
data replication, ensuring data integrity and
availability at all times.

Setting up data replication:

CREATE REPLICATION GROUP
my_replication
AS REPLICATION TO REGION
'aws_us_west_2';.
This replicates the database to a different
AWS region.

79
Data Engineering 101- Snowflake

Custom Data Types

Snowflake allows users to define custom

data types and enforce data integrity
through constraints and validation rules.

Creating a custom data type:

CREATE DOMAIN email_type
AS STRING CHECK (VALUE LIKE '%@%.%');
This enforces email format validation.

80
Data Engineering 101- Snowflake

Hybrid Tables

Hybrid tables in Snowflake combine the

benefits of transactional and analytical
processing, allowing for efficient real-time
data analysis.

Creating a hybrid table:

CREATE HYBRID TABLE my_table
(id INT, data STRING);
This table supports both transactional and
analytical workloads.

81
Data Engineering 101- Snowflake

Data Archiving

Snowflake's data retention and archiving

features help manage long-term storage of
historical data, ensuring that it is available
for compliance and analysis.

Setting data retention:

ALTER TABLE my_table
SET DATA_RETENTION_TIME_IN_DAYS =365;
This configures the table to retain historical
data for one year.

82
Data Engineering 101- Snowflake

Data Classification

Data classification in Snowflake helps

categorize data based on sensitivity and
importance, enabling better data
governance and security.

Classifying data:
ALTER TABLE my_table SET TAG
classification = 'sensitive';
This tags the table as containing sensitive
data.

83
Data Engineering 101- Snowflake

Data Masking
Policies
Data masking policies in Snowflake provide
dynamic masking of sensitive data based
on user roles, ensuring that only authorized
users can see the actual data.

Creating a data masking policy:

CREATE MASKING POLICY ssn_mask
AS (val STRING)
RETURNS STRING ->CASE
WHEN CURRENT_ROLE() IN ('analyst_role')
THEN 'XXX-XX-XXXX' ELSE val END;
Applying the policy:
ALTER TABLE customers MODIFY COLUMN ssn
SET MASKING POLICY ssn_mask;

84
Data Engineering 101- Snowflake

Row Access Policies

Row access policies allow Snowflake to

restrict access to specific rows in a table
based on user roles and other criteria,
enhancing data security and compliance.

Creating a row access policy:

CREATE ROW ACCESS POLICY row_policy
AS (val STRING)
RETURNS BOOLEAN ->CURRENT_ROLE() IN
('analyst_role');
Applying the policy:
ALTER TABLE my_table MODIFY ROW
ACCESS POLICY row_policy;.

85
Data Engineering 101- Snowflake

Cross-Cloud
Replication
Snowflake supports cross-cloud replication,
allowing data to be replicated across
different cloud providers (e.g., AWS, Azure,
Google Cloud) for high availability and
disaster recovery.

Setting up cross-cloud replication:

CREATE REPLICATION GROUP
my_replication
AS REPLICATION TO REGION 'azure_eastus';
This replicates the database to an Azure
region.

86
Data Engineering 101- Snowflake

Event-Driven Data
Processing
Snowflake's tasks and streams enable
event-driven data processing, allowing
actions to be triggered based on changes in
data or scheduled intervals.

Creating an event-driven task:

CREATE TASK my_task
WAREHOUSE ='my_warehouse'
AFTER INSERT ON my_table
AS I
NSERT INTO audit_table
SELECT *FROM my_table;

87
Data Engineering 101- Snowflake

Data Encryption Key

Management
Snowflake allows users to manage their
own encryption keys for added security,
providing control over data encryption and
compliance with regulatory requirements.

Setting a customer-managed key:

ALTER DATABASE my_database
SET ENCRYPTION ='my_custom_key';

This uses a user-provided key for data

encryption.

88
Data Engineering 101- Snowflake

Geospatial
Functions
Snowflake provides geospatial functions to
perform spatial analysis and operations on
geographic data, such as distance
calculations and spatial joins.

Using a geospatial function:

SELECT ST_DISTANCE(point1, point2)
FROM my_table;
This calculates the distance between two
geographic points stored in a table.

89
Data Engineering 101- Snowflake

Graph Analytics

Snowflake supports graph analytics,

enabling users to model and analyze
relationships between data points using
graph structures and algorithms.

Performing graph analytics:

CREATE TABLE graph_edges
(src INT, dst INT);
and running graph queries to analyze
relationships.

90
Data Engineering 101- Snowflake

Data Versioning

Snowflake's Time Travel and Zero-Copy

Cloning features enable data versioning,
allowing users to create, manage, and
query different versions of data for analysis
and auditing.

Creating a version of a table:

CREATE CLONE my_table_clone
OF my_table;
This clone represents a version of the
original table that can be queried and
analyzed separately.

91
Data Engineering 101- Snowflake

API Integration

Snowflake supports integration with

external APIs, allowing users to call external
services and incorporate real-time data into
Snowflake queries and workflows.

Creating an external function to call an API:

CREATE EXTERNAL FUNCTION
my_ext_function() RETURNS STRING
API_INTEGRATION =my_api_integration;
This function can call an external API and
return the result to Snowflake.

92
THANK YOU

Snowflake Notes
100% (10)
Snowflake Notes
67 pages
SnowPro™ Core Certification Companion Maja Ferle Instant Download
0% (1)
SnowPro™ Core Certification Companion Maja Ferle Instant Download
147 pages
Cof-C02 6
No ratings yet
Cof-C02 6
38 pages
10 ETL Testing SQL Queries by Yogesh Tyagi 1738066872
No ratings yet
10 ETL Testing SQL Queries by Yogesh Tyagi 1738066872
13 pages
SnowProCore Exam Study Guide 011425 COF C02
No ratings yet
SnowProCore Exam Study Guide 011425 COF C02
14 pages
Snowflake Certification
No ratings yet
Snowflake Certification
102 pages
Mastering JSON Processing in Snowflake Cheat Sheet
No ratings yet
Mastering JSON Processing in Snowflake Cheat Sheet
2 pages
PL 500t00a Enu Powerpoint 06
No ratings yet
PL 500t00a Enu Powerpoint 06
42 pages
AWS Data Lakes Course Overview
No ratings yet
AWS Data Lakes Course Overview
187 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Rakesh Kumar - 21554244 - Big Data - Assessment 2
No ratings yet
Rakesh Kumar - 21554244 - Big Data - Assessment 2
23 pages
GB-T 706-2008
No ratings yet
GB-T 706-2008
29 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Snowflake - Interview Questions
No ratings yet
Snowflake - Interview Questions
15 pages
AWS Tools for Data Engineers
No ratings yet
AWS Tools for Data Engineers
24 pages
AWS Certified SysOps Administrator
No ratings yet
AWS Certified SysOps Administrator
3 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
Snowflake Row-Level Security Using Row Access Policies - by Debi Prasad Mishra - Snowflake - Jan, 2023 - Medium
No ratings yet
Snowflake Row-Level Security Using Row Access Policies - by Debi Prasad Mishra - Snowflake - Jan, 2023 - Medium
5 pages
Aws Certified Data Engineer Associate 1
No ratings yet
Aws Certified Data Engineer Associate 1
18 pages
Data Lake Implementation Improved Processing Time by 4X
No ratings yet
Data Lake Implementation Improved Processing Time by 4X
5 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
SnowFlake Course Brochure FINAL
No ratings yet
SnowFlake Course Brochure FINAL
7 pages
Cof-C02 2
No ratings yet
Cof-C02 2
39 pages
Azure Data Factory: Cloud ETL & Integration
No ratings yet
Azure Data Factory: Cloud ETL & Integration
10 pages
COF C02 Demo
No ratings yet
COF C02 Demo
4 pages
DataStage Faq S
No ratings yet
DataStage Faq S
57 pages
Offensive Security Consultant - Spider Labs
No ratings yet
Offensive Security Consultant - Spider Labs
17 pages
Snowflake Data Warehouse Top Commands
No ratings yet
Snowflake Data Warehouse Top Commands
61 pages
Azure Databricks Onboarding Guide
No ratings yet
Azure Databricks Onboarding Guide
298 pages
Snowpro-Core 7
No ratings yet
Snowpro-Core 7
37 pages
Snowpro Advanced: Data Engineer: Exam Study Guide
No ratings yet
Snowpro Advanced: Data Engineer: Exam Study Guide
14 pages
Ultimate Mongodb Cheatsheet
No ratings yet
Ultimate Mongodb Cheatsheet
5 pages
Set Your Data in Motion
No ratings yet
Set Your Data in Motion
8 pages
SnowPro Core Test Prep
No ratings yet
SnowPro Core Test Prep
105 pages
Spark in Production
No ratings yet
Spark in Production
34 pages
Snowflake Architecture Guide
No ratings yet
Snowflake Architecture Guide
18 pages
Azure Cosmos DB Change Feed Guide
No ratings yet
Azure Cosmos DB Change Feed Guide
8 pages
KBT RACE 2 User Manual
No ratings yet
KBT RACE 2 User Manual
4 pages
ETL Mastery for Data Professionals
100% (1)
ETL Mastery for Data Professionals
15 pages
SQL Joins and Functions Guide
No ratings yet
SQL Joins and Functions Guide
1 page
Snowflake Setup - MD
No ratings yet
Snowflake Setup - MD
2 pages
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
No ratings yet
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
1 page
Data Bricks
No ratings yet
Data Bricks
43 pages
Virtual Net PDF
No ratings yet
Virtual Net PDF
590 pages
2 - Snowflake de Feb25
No ratings yet
2 - Snowflake de Feb25
90 pages
Oracle Data Integrator Overview
100% (1)
Oracle Data Integrator Overview
14 pages
Snowflake Interview Questions PDF
No ratings yet
Snowflake Interview Questions PDF
6 pages
Pyspark Practice - Databricks
No ratings yet
Pyspark Practice - Databricks
66 pages
Cloud Data Warehouse
No ratings yet
Cloud Data Warehouse
7 pages
Build A Data Pipeline Using AWS Glue
No ratings yet
Build A Data Pipeline Using AWS Glue
27 pages
Data Egineer Interview Questions
No ratings yet
Data Egineer Interview Questions
126 pages
Snowproans
No ratings yet
Snowproans
85 pages
Snowflake & Capgemini: Data & AI Solutions
No ratings yet
Snowflake & Capgemini: Data & AI Solutions
16 pages
Visual Studio Testing Tools - V4
No ratings yet
Visual Studio Testing Tools - V4
31 pages
Snow SQL
No ratings yet
Snow SQL
3 pages
Snowflake Ques
No ratings yet
Snowflake Ques
1 page
Competitive Intelligence Course
No ratings yet
Competitive Intelligence Course
36 pages
Oracle Forms for Developers
No ratings yet
Oracle Forms for Developers
16 pages
Snowflake - End To End Learning
No ratings yet
Snowflake - End To End Learning
93 pages
Snowflake and Its Benefits
100% (1)
Snowflake and Its Benefits
93 pages
How To Create Your NFT Marketplace With An OpenSea Clone Script
No ratings yet
How To Create Your NFT Marketplace With An OpenSea Clone Script
6 pages
Empowerment Tech Exam: Excel & Web
No ratings yet
Empowerment Tech Exam: Excel & Web
5 pages
Factores Comunes en Psicoterapia
No ratings yet
Factores Comunes en Psicoterapia
19 pages
RTS Assignment-2 Part Ans
0% (2)
RTS Assignment-2 Part Ans
11 pages
Hardware Features of The Cisco ASR 1001-X Router
No ratings yet
Hardware Features of The Cisco ASR 1001-X Router
8 pages
Administare Netwrok and Peripheral Devices Information Sheet
88% (16)
Administare Netwrok and Peripheral Devices Information Sheet
54 pages
Downloads
No ratings yet
Downloads
15 pages
11.embedded Systems+GS
No ratings yet
11.embedded Systems+GS
10 pages
Module 1 - Intro To Computing
No ratings yet
Module 1 - Intro To Computing
23 pages
Instruction Formats in Computer Architecture
No ratings yet
Instruction Formats in Computer Architecture
5 pages
Dictionary in Python
No ratings yet
Dictionary in Python
6 pages
SQL Server DBA Professional Profile
No ratings yet
SQL Server DBA Professional Profile
4 pages
Windows VSS Error Solutions
No ratings yet
Windows VSS Error Solutions
9 pages
Information Security Transformation-Nahil Mahmood-Lecture 7
No ratings yet
Information Security Transformation-Nahil Mahmood-Lecture 7
5 pages
Passwords
100% (1)
Passwords
4 pages
Os Lab
No ratings yet
Os Lab
27 pages
Velammal Bodhi Campus: A Project Report On
No ratings yet
Velammal Bodhi Campus: A Project Report On
17 pages
1 Introduction Fall24v1
No ratings yet
1 Introduction Fall24v1
19 pages
ROX User Guide RX1000 PDF
No ratings yet
ROX User Guide RX1000 PDF
341 pages
Syserr
No ratings yet
Syserr
2 pages
Saksham Jain: November 2024 - Present
No ratings yet
Saksham Jain: November 2024 - Present
1 page
Bug Hunting 2025
No ratings yet
Bug Hunting 2025
3 pages
Industry 4.0 Chapter 3 Notes
No ratings yet
Industry 4.0 Chapter 3 Notes
6 pages
MyMiniFactory User Demographics
No ratings yet
MyMiniFactory User Demographics
3 pages
Shubham: Contact Objective
No ratings yet
Shubham: Contact Objective
2 pages
Dell EMC PowerEdge C6525 - FSM
No ratings yet
Dell EMC PowerEdge C6525 - FSM
124 pages
DUGUIT, Léon - Les Transformations Du Droit Public
No ratings yet
DUGUIT, Léon - Les Transformations Du Droit Public
317 pages
Interview Questions For QA Tester
No ratings yet
Interview Questions For QA Tester
27 pages