Roadmap: From SQL to Data Analyst & Data Engineer
1. Advanced SQL (Post Aggregates)
- Window Functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD()
- CTEs (WITH Clauses): Nested Queries ko readable banana
- CASE Statements: Conditional logic inside SELECT
- Set Operations: UNION, INTERSECT, MINUS
- Analytical Functions: SUM() OVER(), AVG() OVER(), etc.
2. Snowflake Environment Basics
- Databases, Schemas, Warehouses ka structure
- Table types: Permanent, Temporary, Transient
- Virtual Warehouses and Scaling behavior
- Storage vs Compute separation
3. External Stage Handling
- Stages: Internal vs External (S3, Azure, etc.)
- CREATE STAGE ka syntax
- LIST @stage_name to view files
- Importance of understanding source structure
4. File Formats & Metadata
- CSV, JSON, Parquet support
- File Format Creation (field_delimiter, skip_header, etc.)
- Using FILE_FORMAT => 'name' in queries
- Metadata Columns: METADATA$FILENAME, METADATA$FILE_ROW_NUMBER
5. File Investigation
- Select queries from stage with file format to preview contents
- Using VARIANT datatype for flexible structure
Roadmap: From SQL to Data Analyst & Data Engineer
- Identifying headers and data structure in raw files
6. Data Ingestion (COPY INTO)
- COPY INTO syntax from stage to table
- File format tuning for ingestion (record_delimiter, skip_header)
- Inserting into custom table (RAW_DATA) with metadata columns
7. Data Transformation & Cleaning
- Creating derived tables using SELECT
- Filtering out bad rows (NULL, garbage, etc.)
- Using CAST(), SPLIT(), TRIM(), etc. for cleaning
8. Automation in Snowflake
- Streams: Change data capture (CDC)
- Tasks: Scheduling SQL scripts
- MERGE INTO for upsert operations
- Using Tasks + Streams for incremental pipelines
9. Bonus: Optimization & Cost Control
- Using RESULT_CACHE, WAREHOUSE SIZING
- Clustering keys for large datasets
- Monitoring Query History & Warehouse Usage