Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views17 pages

Snow

The SnowPro Cheatsheet provides a comprehensive overview of Snowflake's architecture, features, and functionalities, including its decoupled compute and storage model, various editions and pricing, and data sharing capabilities. It details the system's multi-cluster architecture, caching mechanisms, and the types of tables and views available, as well as the management of virtual warehouses and data types. Additionally, it outlines the release process, web UI features, and the roles and permissions associated with Snowflake accounts.

Uploaded by

Jainam Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views17 pages

Snow

The SnowPro Cheatsheet provides a comprehensive overview of Snowflake's architecture, features, and functionalities, including its decoupled compute and storage model, various editions and pricing, and data sharing capabilities. It details the system's multi-cluster architecture, caching mechanisms, and the types of tables and views available, as well as the management of virtual warehouses and data types. Additionally, it outlines the release process, web UI features, and the roles and permissions associated with Snowflake accounts.

Uploaded by

Jainam Soni
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

SnowPro Cheatsheet

INTRODUCTION:
1. Decoupled compute and storage
2. No hardware or no maintenance
3. Snowflake is fully ACID complaint and it support all kind of tables (except External
Tables)
4. SnowSQL is a CLI Tool
5. SQL functionality can be extended via SQL UDF, Javascript UDf and Session variables.
6. USE_CACHED_RESULT = ‘FALSE’

Region, Edition & Prices:


1. Hosted in AWS, Azure & GCP(2020). (North America, Europe, Asia Pacific)
2. Snowflake Url: yz90770.ap-south-east-1.aws.snowflakecomputing.com=>
a. Account number- yz90770.ap-sout-east-1
b. Cloud- AWS
c. Saas Service- snowflakecomputing
d. When its just account name then cloud is AWS region is US-WEST-2
3. Types of Edition:
a. Standard (1 Day TimeTravel,Secure View*, Secure UDF*,Failsafe*,Data
Sharing*)
b. Enterprise (90 Day TT*, Multi-Cluster WH (Also called as Elastic Data
Warehousing)*, Materialized Views*, SOS*,Column level Security*)
c. Business Critical (Query Encryption,Tri-secret Encryption, Compliance*, Private
Link Support, Failover/Disaster Recovery*,Supports Private Communication*)
d. VPS (Customer dedicated memory store(Own CSL))
4. Prices:
a. Credits depend on Regions.
b. Storage price depends on On-Demand($40) and Pre-Purchased($23)
5. Replication can be done across accounts(Only in same Region)
6. Data Sharing is possible in 2 accounts (Only in same Region) (Not in different Regions
but can be done if u create replication same for cross region). Cross Region data sharing
can be done through Cloud provides (Aws,Gcp,Azure) only.
7. Data sharing from VPS account is not possible to other accounts.
8. Failsafe is applicable across all Editions.
9. Column level security (Dynamic Data /masking,External Tokenization)

Architecture:
1. Multi-Clustered Clusters,Hybrid of Shared-disk and Shared-Nothing Architecture
2. 3 Layers:
a. Data Storage (Compressed Columnar Format, Micro-Partitions => Logical
Structure of table).
i. Two important features: Timetravel and Snapshot Cloning
b. Virtual Warehouses (Independent Clusters - Dont share resources) (Query
Processing)
c. CSL (Brain, Query Planning,Optimization and Compilation)

©2021 Jainam Soni


SnowPro Cheatsheet

i. Authentication
ii. Infrastructure Mngmnt
iii. Metadata Mngmnt
iv. Query Parsing and Optimization
v. Access Control
7. Caches : Reduce cost and improve performance cost.
a. Metadata Cache or Metadata Layer or CSL or ServiceLayer(Owned With CSL)
i. Holds object information
ii. Queries like:
1. Desc table tablename
2. Select Current_user(), current_version()
3. Select count(*) fro’m table_name
4. Select min(col) from table_name
b. Result Cache (Owned with CSL)
i. Holds for 24 Hrs (When query is run again, it is for 24 hrs till 31 days)
ii. Exact same query
iii. Users cannot see others result but result cache by one user is used by
another user
iv. Doesn’t use values such as CURENT_TIMESTAMP
v. When underlying data is changed or table is dropped, this cache is
refreshed or maybe unavailable.
c. Local Disk Cache or WH Cache or SSD Cache or Data Cache or Raw Data
Cache(i.e No Aggregated Data):
i. Data Hold by WH as local
ii. Deleted when Wh is suspended.
iii. If WH is Resized then Cache is purged i.e removed
iv. Query is changed it can use this cache if available
d. Bytes Scanned in Green Color => Remote Storage, Bytes Scanned in Blue
Storage => Local Storages

Releases:
1. Releases every week
2. Release process is Transparent and no downtime
3. Release Types:
a. New Release(Feature, Updates, Enhancement, Bugs).
b. Patch Release(Fixes)
8. New Release follows 3 Stage process.
9. Patch Release is available to all at same time.
10. New Release is first given to early users, then standar, enterprise and so on.
11. Early access is provided to only Enterprise and higher editions before 24 Hrs.
12. You cannot get back to previous versions.

Web UI:
1. Database,Share,Data Marketplace,Warehouse,worksheet,history

©2021 Jainam Soni


SnowPro Cheatsheet

2. MFA can be enabled via Webui


3. Using history area of webui you can check history for past 14 days
4. When user logs out from account, running queries are stopped and user needs to
resume them again.
5. Connectors:
a. Web based UI
b. SnowSQL
c. ODBC/JDBC
d. GO
e. Native Connectors (Python, spark, kafka)
f. Third party connectors (Matillion)
g. Others (Node.js,.NET, etc.)

Catalog & Objects:


1. Object Hierarchy (Container Hierarchy): Integration also on LHS. RHS also has External
Tables, Stream, Task

2. Once your SF account is created:


a. 5 Default roles are created: ACCOUNT ADMIN(L1), SECURTIY ADMIN(L2), SYS
ADMIN(L2),USER ADMIN (Recently added L3 below Security Admin), PUBLIC,
Custom Roles. Default for 1st user is: SYSADMIN
b. A WH Called COMPUTE_WH is created with X-Small Size, min:1 max:1, auto
suspend:10 minutes.
c. 2 Default db are created i.e DEMO_DB, UTIL_DB along with 2 other DB’s:
i. SNOWFLAKE (Only for ACCOUNT ADMIN & SECURTIY ADMIN i.e
Read only and shared database)
1. ACCOUNT_USAGE
a. Contains dropped objects.
b. Retention from 7 days to 6 months
c. No latency
2. DATA_SHARING_USAGE

©2021 Jainam Soni


SnowPro Cheatsheet

3. INFORMATION_SCHEMA
a. Do not Contain dropped objects.
b. Retention from 1 year
c. Latency: 45 mins to 3 hrs
4. ORGANIZATION_USAGE
5. READER_ACCOUNT_USAGE
ii. SNOWFLAKE_SAMPLE_DATA
1. INFORMATION_SCHEMA
2. TPC…. Schemas
3. WEATHER Schema
d. When a new DB is created, it has INFORMATION_SCHEMA and PUBLIC
Schema.
e. TABLES:
i. All SF Tables are divided into Micro-Partition.
ii. Each micro-partitons are compressed columnar data:
1. Max size of 16MB compressed.
2. Stored on Logical HDD
3. They are immutable and can’t be changed.
4. Metadata of Micro-partiiton is stored in CSL
5. Improves query performance on large tables by skipping data.
iii. SF recommends use of Clustering key (If Multi-Tb in size)
iv. SF only enforces NOT NULL constraint.
v. Table Types:
1. PERMANENT:
a. Default Table type
b. Persist until dropped
c. Time Tavel: 0 to 90 days
d. Failsafe: Yes
2. TEMPORARY:
a. Persist till session
b. Time Tavel: 0 or 1 days
c. Failsafe: No
d. Cannot be converted into any other table type
3. TRANSIENT:
a. Persist until dropped
b. Time Tavel: 0 or 1 days
c. Failsafe: No
4. EXTERNAL TABLE:
a. Persist until dropped
b. Read only
c. Time Tavel: No
d. Failsafe: No
e. Cloning: No

©2021 Jainam Soni


SnowPro Cheatsheet

f. Querying external table is slower than querying normal


table
g. Materialized view can be created on external table
h. Join conditions can be applied on This tables.
f. VIEWS:
i. STANDARD / REGULAR:
1. Named definiton of a query
2. Improve code readability
3. Default View type
4. Does not store the results.
5. Executes as owning roles.
6. Underlying DDL available to any role with access to view
ii. MATERIALIZED:
1. Behaves like a TABLE.
2. Store the results ( Incurs additional costs)
3. Auto-Refreshing (Own SF WH)
4. Secure Materialized View is also supported.
5. Query containing huge computation is usecase.
6. To improve performance
7. Cannot query multiple tables. So join is not possible
8. Limited aggregations are allowed in this.
9. CURRENT_TIMESTAMP is not supported.
10. Time Travel is not supported.
iii. SECURED:
1. Executes as owning roles.
2. Underlying DDL stmnt are visible to authorized users.
3. Different optimizations.
g. UDF’s:
i. SQL and JAVASCRIPT Supported.
ii. No DDL/DML support.
iii. Can be unsecure / secure
iv. Returns single scalar value or set of rows(Table).
v. UDTF returns only set of rows.
h. STORED PROCEDURES:
i. JAVASCRIPT API supported.
ii. May return a value.
i. STAGES:
i. To Transfer file from local to stage: PUT & from stage to local: GET
ii. To Transfer file from stage to tables and vice versa.: COPY INTO
iii. Irrespective of SF platform, user can have external named stage in any of
three cloud providers
iv. Data is stored in UTF-8 character set.
v. STAGE TYPES:
1. User:

©2021 Jainam Soni


SnowPro Cheatsheet

a. @~[Login]
b. Automatically defined
c. File format should be given in copy into
2. Table:
a. @%[TABLE_NAME]
b. Automatically defined
c. No transformation while loading
d. File format should be given in copy into
3. Named:
a. Internal Named: @[STAGE_NAME]
i. Temporary: When dropped, data files are purged.
ii. Can not be cloned
b. External Named (Azure, Gcp, Aws): @[STAGE_NAME]
i. Temporary: When dropped, data files are not
removed.
ii. Can be cloned
j. DATA TYPES:
i. Numeric - NUMBER(38,0)
ii. String/Binary:
1. String/Text/Varchar/Character/Char (16 Mb uncompressed,
Default: Max length)
2. Char(1)= Varchar(1)
3. binary=varbinary (8 Mb uncompressed, Default: same length
always)
iii. Boolean (can have unknown value i.e NULL)
1. Conversion from string: True/t/yes/y/on/1 => True
2. Conversion from Numeric: 0 => False, Any non zero => True
iv. Date/Time (Default Precision: 9 i.e 0 to 9 and uses Gregorian Calendar):
1. Time (Hh:mm:ss)
2. Date
3. DateTime (TIMESTAMP_NTZ)
4. TimeStamp
5. TimeStamp_LTZ
6. TimeStamp_NZ
7. TimeStamp_TZ
v. INTERVAL constant to add/subtract from date/time (Not a datatype)
vi. All Float4,Float8, etc as Float:
1. Supports special types: ‘NaN’, ‘inf’, ‘-inf’
vii. String Constant/ Literals:
1. Must be enclosed in single quotes (‘) or dollar signs ($)/
viii. No of digits After decimal (Scale) has an impact on storage.
ix. Unsupported Data types: LOB, CLOB, ENUM or User defined Datatype

Data Sharing:

©2021 Jainam Soni


SnowPro Cheatsheet

1. Sharing feature is achieved using SF Service Layer and Metadata Layer(Metadata only).
2. Storage is charged in Producer’s account and Compute in Consumer’s account(For
reader, compute would be charged to producer only).
3. Tables, External Tables, secure views, Secure Materialized View and secure Udf’s
can be shared.
4. VPS does not support secure data sharing.
5. WebUi does not support add/removing secure UDF’s from shares.
6. Share are named snowflake object that can contain only a Single Database.
7. Sharing data from multiple table can be done via secure views.
8. Only ACCOUNTADMIN can provision share object or reader account.
9. Changing ownership of existing shares is not possible.
10. Shared databases are read-only
11. No limit on adding shares and consumers.
12. Any new object added within shared db, grants have to be given explicitly
13. Cannot create 2 or more db’s from shared object
14. Consumers can query shared object in same way they query there own objects.
15. If you add an Object to a share, it is immediately available to Consumers. Similarly, if you
remove privileges they are inaccessible.
16. There is no TimeTravel for consumer shared database.
17. Data sharing is only possible in same cloud and same region.
18. To Reshare a shared object, error would be shown.
19. Cloning of share object, Time travel is not allowed.
20. Cannot edit comment of shared db.
21. Show shares; It has KIND column which shows share is Inbound or Outbound.
22. Product offering for Secure data sharing:
a. Direct share
b. Data marketplace:
i. It has two types of Data listings: Standard Data Listing, Personalised Data
listing.
c. Data exchange.
23. Reader’s Account (Managed Accounts):
a. It is an alternative if consumer does not have a SF account
b. Own and controlled by producer or provider account.
c. Show managed accounts;
d. Create managed account jainam admin_name=’’ admin_password=’’ type=reader

VIRTUAL WAREHOUSES:
1. VWH is Cluster of servers with CPU, Memory and Disk
2. Executes on SELECT as well as DML operations(Delete,Insert,Update,Copy into)
3. X-small has 1 Server per cluster. Similarly small has 2 servers per cluster.
4. VWH size is T-shirt sizes(8) i.e X-small to 4XL (i.e 1,2,4,8,16...128)
5. In CMD we have “X-SMALL” as default and in WebUi we have “X-Large” as default.
6. Can be stopped and resize at any time (Only new queries get affected)
7. When creating a VWH you can specify:

©2021 Jainam Soni


SnowPro Cheatsheet

a. Auto-suspend: suspends wh if active after certain mins (Default: 10 mins in UI,


lowest is 5 mins in UI, in cmd it’s 1 min). If “NULL” means never suspends
b. Auto-resume: Resumes wh whenever wh is required by sql stmnt
8. VWH types:
a. Standard: Will only have single compute cluster. (Set Max cluster=1 in WebUI)
b. Multi-Cluster WH:
i. Upto 10 server clusters. (Default: min:1, max:10)
ii. Auto-resume and suspend aren’t for single server but a single cluster.
iii. Can resize anytime
iv. Multi-Cluster Mode:
1. Maximised: Associate same min and max cluster.
2. Auto-Scale: Have different min and max cluster.
9. Scaling Policy:
a. Standard: If queries are queuing start adding additional clusters (20 seconds for
each cluster to start successively).
b. Economy: If queries are in queue and the system thinks cluster would be busy for
atleast 6 minute.
10. Resizing a Warehouse:
a. If it’s suspended, will start with new size next time
b. If it’s running, Running queries complete at current size, queued queries run on
new size.
11. Scale Up/Down: Medium->Large (Solves complex queries taking time i.e for
performance) Manual Process
12. Scale Out/In: Enable MCW i.e increase number of clusters (Solve concurrency
problem). Scale out during peak times and scale back during slow times. Autonomous
Process
13. VWH is charged per sec, with 1 min minimum billing.
14. Snowflake waits till all servers are provisioned for new warehouses. (If any fails to start,
then it starts executing once 50% of them are available)
15. Queries are queued if we don’t have enough resources.
STATEMENT_TIMEOUT_IN_SECONDS(Default: 48 hrs) and
STATEMENT_QUEUED_TIMEOUT_IN_SECONDS can be used to control query
processing and concurrency.
16. Cloud Services is charged only for additional credits than 10% of total compute billing
17. Resource Monitor:
a. Each warehouse can be assigned to only single Resource Monitor.
b. Notification can be sent through Web UI and Email only.
c. Actions that can be performed are:
i. Suspend
ii. Notify
iii. Suspend Immediately.
d. Monitor Level is used to defined RM is an ACCOUNT level or defined on WH’s
e. Access control privileges required are:
i. Monitor

©2021 Jainam Soni


SnowPro Cheatsheet

ii. Modify

SNOWFLAKE STORAGE AND PROTECTION:


1. Storage Layer:
a. Hybrid Columnar
b. Logical structure to physical structure
c. Automatic micro-partitioning
d. Natural data clustering
2. Micro-Partitions:
a. Holds data
b. Max 16Mb Compressed data (50-500Mb of uncompressed data)
c. They are IMMUTABLE!!
d. Many micro-partitions per Table
e. CSL stores metadata about every micro-partition like MIN/MAX/Distinct
values/Range of values.
f. Smaller the average depth, better clustered the tables is w.r.to columns.
g. Table with no micro-partition has a clustering depth of 0.
h. If column has higher cardinality, then maintaining clustering is very expensive.
3. Metadata:
a. SF maintains Metadata of :
i. Table Level:
1. Row count
2. Table size (in bytes)
3. File references and Table versions
ii. Micro-Partition Level:
1. Range of values
2. No of distinct values
3. MIN/MAX
4. NULL count
b. It supports Zero copy cloning, data sharing and time travel.
4. Data Protection or Continuous Data Protection (CDP):
a. Encrypted at rest and in motion (Using AES-128 or 256 default)
b. Hierarchical Key(Root ket=> Account keys => Table keys => File Keys)
c. Periodic rekeying(Account and Table master key) is possible for key>30 days
(Enterprise+)
d. All communication is via HTTPS. I.e internet is secure and encrypted via TLS1.2
or higher
e. Tri-Secret secure (BYOK): customer-key + snowflake key
f. Time Travel:
i. Access historical records using time or ID within specified retention time
ii. Disable by setting it to 0, but cannot be done at account level.
iii. Default : 1
iv. Select * from table BEFORE <queryid> or Select * from table AT
<timestamp> or Undrop

©2021 Jainam Soni


SnowPro Cheatsheet

g. Failsafe:
i. Not configurable (7 day extra period after Time Travel)
ii. Available only for Permanent Tables
iii. Only accessible by SF
h. Cloning:
i. Referred as Zero-copy-cloning.
ii. Only Metadata is copied so No Storage Costs until a change is done. If its
done to cloned table, then new micro-partition is created (Storage costs
included).
iii. It references to Table so (No storage Costs), but when a changed is Do
not inherit the source’s grant privileges but if source is database or
schema then privileges is possible
iv. To clone a table, your current role must have SELECT privileges on
source table.
v. To clone a database,schema your current role must have USAGE
privileges on source table
vi. External Tables cannot be be cloned
vii. All Stages except Internal Named Stage can be cloned.
viii. When a Stream is Cloned, unconsumed records in streams are
inaccessible.
ix. When a Task is Cloned, it needs to be resumed individually.
x. Cloning has started and data is changed in a table and retention time is 0,
then it gives error. To avoid, either do not perform DML operations or
increase retention time.
xi. Files that have already been processed into the source table can be
loaded again into a cloned table. I.e history is not stored of loaded files.
DATA MOVEMENT:
1. File Location:
a. On Local
b. On Cloud (S3, Blob storage, GCS)
2. File Type:
a. Structured: CSV,TSV,etc
b. Semi-structured: JSON,ORC,PARQUET,AVRO and XML
c. User can specify compression on loading, Default Compression: GZIP
3. Encryption on Load:
a. Files can be loaded to SF by providing key to SF on load
b. Unencrypted files using AES-128 bit keys (or AES-256-bit keys) i.e using
CLIENT_ENCRYPTION_KEY_SIZE
4. Best Practices while Loading:
a. Split larger files into small files
b. 10-100Mb file compressed for data load is ideal
c. Parquet >3Gb compressed - should be split into 1GB chunks
d. Variant datatype has 16 Mb compressed size limit per row

©2021 Jainam Soni


SnowPro Cheatsheet

e. For JSON or AVRO, outer array structure can be removed using


STRIP_OUTER_ARRAY
f. SF recommends removing the data from stage once load is completed - use
REMOVE command (and specify purge in COPY param)
g. SF data loading rate is affected by:
i. Physical location of the stage.
ii. No and type of Columns.
iii. Gzip Compression Efficiency.
h. Data transformations:
i. Same number of columns or order of files is not required.
ii. Internal or External Named Stages, Internal User stages are supported for
copy transformations.
iii. Supported: Sequence,substring,to_binary,to_decimal. Column
reordering, casting, omit columns,concat,truncate.
iv. Not Support: where,flatten,join,group by, distinct ,validation_mode (if
aggregation).
i. SF loads semi-structured data(JSON,AVRO,XML) to VARIANT type column. If
dates and timestamps are stored, they are stored as String in Variant column
j. Use FLATTEN to explode values into multiple rows.
5. LOADING DATA:
a. Snowflake maintains detailed metadata for each table loaded: Name,File
size,Etag, No of Rows, Timestamp of Last Load(if older than metadata history i,e
14 or 64 days then file is skipped),Errors if any.
b. When files are at staged, the metadata has columns which can be accessed by:
i. Can only be accessed by NAME.
ii. METADATA$FILENAME
iii. METADATA$FILE_ROW_NUMBER
c. BULK LOADING (BATCH MODE):
i. It has 64 days of Metadata history
ii. WH is required
iii. Load from Local and Cloud:
iv. Create File Format (XML,JSON,PARQUET,ORC)
v. Create Internal stage / External stage with credentials
vi. Validating can be done in 2 ways:
1. Use VALIDATION_MODE: validate errors on load and does not
load data into table. (Can be done pre-load as well as post-load).
It does not support “COPY statement” that Transform data during
load.
2. ON_ERROR actions to perform: continue,skip_file (Default for
snowpipe), skip_file_10, skip_file_10%, abort_statement (Default
for Bulk loading using COPY)
vii. Upload file from local using “PUT” (Not for Cloud i.e External Stage)
viii. COPY INTO command:
1. Size limit

©2021 Jainam Soni


SnowPro Cheatsheet

2. Purge: False (default). If error occurs and purge is not done, error
is not shown to user.
3. Force: False (default)
4. Pattern = [‘regex pattern’]
5. To load files whose metadata has expired, set the
LOAD_UNCERTAIN_FILES = True.
6. To ignore metadata during loading, set FORCE.
7. Parallely load upto 1000 files.
d. CONTINUOUS LOADING (SNOWPIPE):
i. 14 days of Metadata history
ii. Snowflake compute resource is used
iii. Snowpipe is used to load small volume of frequent data
iv. SnowPipe allows loading data from files as soon as they are available in
External stage
v. Done using COPY INTO command (All datatype are supported)
vi. File arrived detection mechanism:
1. Using Cloud Notification i.e AUTO_INGEST (External Stages
only)
2. Calling REST API Endpoint (Internal + External stages)
a. insertFiles: Informs snowflake about files to be ingested.
b. insertReport: 10000 events are retained, for max of 10
mins.
c. loadHistoryScan: Fetches a report about ingested files
whose contents have been added to the table.
vii. Snowpipe can be paused or resumed using
PIPE_EXECUTION_PAUSED = True. It is supported by Account, Schema
or Pipe.
viii. Stopped is not an execution state.
ix. SnowPipe copies files into Ingestion Queue from where files are loaded to
snowflake.
x. Snowpipe cannot load a file with same name even if its modified one.
xi. Snowflake features for enabling continuous data pipelines:
1. Continuous data loading:
a. Snowpipe.
b. Snowflake connector for Kafka:
i. Snowflake table loaded by Kafka connector has a
schema consisting of 2 variant columns:
RECORD_CONTENT, RECORD_METADATA
ii. Record_metadata contains: Topic, Partiiton, Key,
CreateTime/LogAppendTime
iii. Kafka connector guarantee exactly-once delivery.
iv. When neither key.converter or value.converter is
set, then most SMT are supportedwith an exception
of regex.router.

©2021 Jainam Soni


SnowPro Cheatsheet

c. Third party data integration tools.


2. CDC using Table Streams.
3. Recurring tasks.
xii. Snowpipe charges 0.06 credits per 1000 files notified
e. CONTINUOUS DATA PROCESSING (STREAMS & TASKS):
i. STREAMS:
1. Streams or (Table Stream) are a CDC i.e Change data Capture for
SF which identifies and acts on changed table records.
2. It does not hold any table data. It stores the OFFSET for source
tables and returns CDC records by leveraging the versioning of
source table.
3. Steam supports Repeatable read isolation.
4. Stream cannot track changes in M.Views
5. Types:
a. Standard: Tracks and gives a result of all the tracks.
b. Append-only: Tracks row inserts only
c. Insert-only:Supported on EXTERNAL TABLES only.
Currently in private preview.
6. In a stream, if a row is added and then updated in current offset,
then delta will be captured as “New Row”
7. A Stream becomes stale when it’s offset is greater than the data
retention period. If data retention period for a source is less than
14 days, snowflake extend this to 14 days to prevent stale.
Maximum days for which snowflake can extend data retention
period is determined by MAX_DATA_EXTENSION_TIME_IN_DAY.
8. Create stream stream_name on table table_name
9. Show Streams; or Execute Describe Streams;
10. ALTER TABLE … CHANGE_TRACKING=True;
11. Show Streams history gives(dropped_on column)
12. Select * from stream_name shows:
a. METADATA$ACTION: Insert ...
b. METADATA$IS_UPDATE: True or False
c. METADATA$ROW_ID: qergrevsd...
13. SYSTEM$STREAM_HAS_DATA(‘Stream_Name’): Tells if any
CDC happend or not basically.
14. The current offset for a stream can be determined by
SYSTEM$STREAM_GET_TABLE_TIMESTAMP function.
ii. TASKS:
1. Tasks are schedule sql execution
2. Can create tree of Tasks where one tasks follows another
3. AFTER can be used to create tree of tasks
4. Schedule interval is set in minutes or using cron expression
5. Can not be triggered manually.
6. Tasks are in suspended state initially when created

©2021 Jainam Soni


SnowPro Cheatsheet

7. A tree of Tasks can have maximum of 1000 tasks total.


8. Task can have maximum 100 child tasks.
9. If a Task is running and time occurs to run task again, this time it’s
skipped.
10. Task can execute a single statement or a stored Procedure only
and not multiple sql’s or function.
11. When you remove predecessor of a Child Task, it may become a
Stand-alone Task or Root Task.
12. Task maximum limit if it does not stop is 60 minutes. You can alter
it using USER_TASK_TIMEOUT_MS = <num>

6. UNLOADING DATA
a. COPY INTO command for unloading to Cloud (can be done without stage also
using url and proper credentials)/ Local (GET Command)
b. Alternative is Select statement which is preferred as all operations can be
applied.
c. Unloading can be done to Internal Stage (Any), External Stage, and External
Locations
d. GET is not supported : To download file from External Stages, Go Snowflake,
.NET, Node.js and limited ODBC drivers.
e. GET command with parallelism is achieved by: PARALLEL=<integer>, where it
can be from 1 to 99 (Default:10)
f. GET has an option: PATTERN=[Regex‘’]
g. File Formats: Flat (CSV,TSV, etc), JSON,PARQUET
h. Use OBJECT_CONSTRUCT to create semi structure format files.
i. During unloading snappy compression is used.
j. Files can be single or multiple (Max 16 Mb) (Default:Multiple)
k. SINGLE:True to set it to True,MAX_FILE_SIZE to limit option can be set.
l. Enclose strings in double or single quotes, empty_field_as_Null: true (Default:f),
convert null values using null_if
m. Any type of data transformation can be done
n. Default name is “data_num_num_num”
o. S3 bucket requires s3:DeleteObject and s3:PutOBject for unloading.

ACCOUNT AND SECURITY OVERVIEW:


1. ACCESS:
a. Network Policies:
i. Allows access based on IP whitelist or restrictions to IP blocklist(priority
high). Don not add ‘0.0.0.0/0’ in block it will restrict your own account.
ii. Only AccountAdmin or SecurityAdmin can modify these or create or drop
b. Access control models:
i. Discretionary Access Control (DAC): All objects have owner and has full
access to the object.

©2021 Jainam Soni


SnowPro Cheatsheet

ii.
Role Based Access Control (RBAC): All privileges related to objects are
assigned to roles. And roles are assigned to user.
2. AUTHENTICATION:
a. MFA:
i. Provided by Duo Services
ii. Each user must enable it themself
iii. SF recommends enable MFA on ACCOUNTADMIN (Minimum)
iv. MFA can be disbaled by AccountAdmin or SecurityAdmin as:
DISABLED_MFA= true or MINS_TOBYPASS_MFA = 5
v. Use with UI,SnowSql,ODBC,JDBC,Python Connector
b. SSO (SAML 2.0) idP allows users to access via federated services i,e login using
tokens directly.
c. SSO is available on Enterprise+
d. MFA,Oauth,SSO is available to all editions
e. SOC 1,2 TYPE 2 ,HITRUST/HIPAA(BCE or+), PCI DSS(BCE or+), FedRamp ,
GxP , ISO27001

SNOWFLAKE PERFORMANCE AND TUNING:


1. Performance and Tuning Overview:
a. Order of Execution : ROWS => GROUPS => RESULTS
b. Always join on unique keys
c. Try effective pruning: Use filters which matches tables clustering order
d. Group-by with low distinct values
e. Never have order by in sub queries, always have them in outermost query if
possible
2. Data Clustering:
a. Tables should be of multi-TB range and if query performance degrades over time
b. Clustering depth(If have to fetch data for a query, how many micropartition i have
to read) for the table is large, then do clustering.
c. Clustered tables keep it on orderly basis and charge per second billing
d. Frequently changing tables are expensive so always try it on Less frequently
changing tables.
e. Reclustering does not require manual configuration.
f. Automatic clustering can be suspended and resumed.
g. Automatic clustering is triggered if and only table would benefit from the
operation.
h. System$Clustering_information function gives depth which tells no of overlaps
thus knowing whether pruning is good or not since it can have huge overlaps.
i. To check if performance improved query before and after.
j. SHOW TABLES has a AUTO_CLUSTERING_ON column to see if it’s enabled or
not.
k. CLUSTER_BY column or CLUSTERING_KEY column shows columns to be
used for clustering.

©2021 Jainam Soni


SnowPro Cheatsheet

l. Choosing a Clustering key: Columns used in Join Predicates,Selective filters or


predicates. Max cluster Recommendation :3-4
m. Min-Max cardinality.

SNOWFLAKE SEMI_STRUCTURED:
1. Can be operated on: JSON,AVRO,ORC,PARQUET,XML
2. Stores in compressed columnar binary representation
3. It is also called as VARIANT type or UNIVERSAL type which can hold any type of data
(ARRAY or OBJECT)
4. Max size 16 Mb compressed
5. In variant NULL is stored as string “null”
6. When snowflake semi structured data is inserted into variant it tries to extract in
columnar format based on certain rules.
7. Querying semi-structured Data (Primarily JSON):
a. Can access semi structure data but not XML
b. <column1>:<level_1_element>
c. Query output is enclosed in double quotes, bcuse query output is VARIANT and
not varchar
d. 2 Ways to access element in json object:
i. Dot notation: select src:sales.name form table
ii. Bracket Notation: select src[‘sales’][‘name’] from table
iii. Here sales,name is case sensitive while src i.e column is case insensitve
e. Casting is done using ::
f. FLATTEN / PARSE_JSON / GET FUNCTION:
i. Flatten is used to produce lateral view of Variant, object or array
ii. Flatten command has lateral and table options.
iii. Output of Flatten query has below Columns:
1. Seq
2. Key
3. Path
4. Index
5. Value
6. This
iv. To parse nested arrays.
v. Get takes value as first argument and extract variant value of the element
in path provided as per second argument

ACID OR TRANSACTIONS:
1. SHOW LOCKS or SHOW TRANSACTIONS.
2. Snowflake does not support nested Transactions i.e When a Transaction is called from
another Transaction, it is not nested, instead it is running in its own scope. So they are
called SCOPED TRANSACTIONS.
3. Commit operations lock resources.
4. UPDATE, DELETE, and MERGE statements hold locks.

©2021 Jainam Soni


SnowPro Cheatsheet

5. If you run a transaction in session, and it discontinues and you left it open it is closed
after “4 Hours” or using SYSTEM$ABORT_TRANSACTION.
6. For, Multi-threaded Programs,
a. Use separate connection for each thread.
b. Execute these threads synchronously.
c. Fact: Multiple sessions cannot share same Transaction. But Multiple Threads
using single connection can share same Transaction.

IMPORTANT POINTS:
1. If you create a table, drop it, create a new table with same name and run UNDROP cmd
it will fail.
2. Replication is supported for DATABASES only.
3. INFORMATION_SCHEMA contains TABLE Functions to give account level usage and
historical data for storage,warehouses,etc.
4. Variables can be set in snowflake. Size of String/Binary => 256 Bytes
5. DATE_TRUNC is used to Truncate the function.
6. Snowflake uses lacework for network traffic and user activity. It uses Sumo Logic and
Threat Stack to monitor failed logins, file integrity monitoring and unauthorized system
modifications.

©2021 Jainam Soni

You might also like