NOTE: ANSWERS MIGHT BE WRONG HERE.
DOCUMENT is SHARED by Mentee.
Today's interview questions
Write a query to time travel on a table for yesterday or any other day
Write a query to mask a column in snowflake
Write grant command for a user on a table
Difference between permanent and transient tables
What is zero copy cloning
How to load the AVRO file when the row size is more than 16MB.
HCL Interview Questions:
1. How do we create snowpipe in snowflake
2. What type of tests do we have in snowflake
3. Versioning in git
4. Project
5. Dynamic tables
6. 3rd highest salary using cte
7. No.of rows for all joins
8. What type of files do we configure in project.yml
9. Difference between union and union all
10. Time travel concept in snowflake
11. Scd implementation
12. Different types of dbt commands which you have used
13. Query optimization in snowflake
14. Which data type do we configure for json
15. What are macros in dbt
16. How do we implement incremental models in dbt
17. How do we configure landing tables in dbt
18. What is the ref function used for
19. What are dynamic tables in snowflake
20. What is a DAG
Optum:
In this round he faced SQL, DWH concepts & Snowflake Questions.
1- What is the difference between JOIN vs UNION?
2- Universal JOIN question , Two tables given with values 1,1,null , 2,3 and find the count of all
values in each joins.
3- Customer table with custid, Name and product placed in separate rows.
Write a query to get the output in below format
CustID Total Product
1 Samsung,LG,Apple
4- What is Junk dimensions?
5- If i have to load 100GB files from AWS S3 to Snowflake, Will it load or failed?
6- How will you implement SCD type2 in Snowflake?
7- Create a Stored Proc for error handling in Snowflake?
8- What is the main difference between Normal view and Materialized view?
9- What is Surrogate key and it's benefits ?
10- How will you ensure the Data quality in your Data migration process?
Tiger Analytics Interview Questions:
If you create temporary tables with the same name as permannet , which one willbe called if
querried in the same session ?
How many types of views
If your WH suspended and you have already queried, will it query from the cache or from the
compute?
What is the setup of snowpipe ?
What happens when same file is being placed in S3 bucket and snowpipe is running , will it load the
same file ?
If you have comma in your data and you have added delimiter as ',' how you will load the data ?
How data masking and data governacne is done ? On what kind of data we mask
Tag - database object on account levele create --
copy history_v
CREATE OR REPLACE MASKING POLICY email_mask AS (val string) returns string -> CASE
WHEN current_role() IN ('ANALYST') THEN VAL ELSE '*********' END;
Avinash Sharma
11:32
CREATE TAG cost_center COMMENT = 'cost_center tag';
If you have a clone table and you add two more columns, what will be the cost ? will it be compute
or storage ?
WHat to do when concurrency rise for parallel multiple execution ? - Multi clustering or auto-scale?
Will COPY command work if it tries to load the same file from the same stage to the same table
again ?
WHich type of data can be applied Data Masking?
If you change anything on clone, does it impact compute or storage cost ?
SQL -
EMpid
100
200
Swap the gender for two ids
Running sales total monthly
EY:
1. I have created the 'employee' table in snowflake and dropped the table, again i have
created the new table with same name as 'employee', then how to undrop the old
table?
2. what is the error we get if we haven't configured?
3. Types of tasks
4. Limitations of tasks
5. After creating the task, it will be in which state? -suspended
6. What is the advantage of columnar data storage?
7. can we disable result cache in snowflake? –Yes,
8. when will be stream gets empty in snowflake?
LTI Mindtree
1. How to load data from S3 to Snowflake
2. Write code for try catch in snowflake
3. Can I write join condition in snowpipe copy into statement? -No
4. Can I write function in stored procedure
5. Tasks code
6. Layers in snowflake
7. Cache in snowflake
8. What is a data warehouse
https://www.ibm.com/topics/data-warehouse
9. What is a data lake
Difference between Data Lake, Data warehouse and Data Mart with a simple example :
➡️My mother went to the grocery shop and bought ingredients such as vegetables, spices,
breads, flour, fruits, etc.
She put all these items in one basket.
📍The basket having all these items can be represented as Data Lake.
➡️She brought all these things home and washed all the items that requires cleaning such
as vegetables and fruits.
After that she separated and placed each categories of items in different shelf.
📍This can be represented as a data warehouse.
➡️Later on she asked what we would like to have for the lunch.
Then she picked and separated only those items which are required for that particular dish.
📍 Separation of ingredients for a specific dish can be represented as Data Mart.
Hope this example helped you understand these data warehousing concepts.
10. What is a JSON format – unstructured or semi-structured, why it is semi-structured
11. What is a Micro partitions
12. Result cache
Splunk
1. outer join & inner join
2. OLTP & OLAP
3. CTE
CTE stands for Common Table Expression in SQL. It is a temporary result set that you
can reference within the context of a single SQL statement, particularly in complex
queries that involve multiple subqueries or self-joins. CTEs enhance the readability and
maintainability of SQL queries by allowing you to break down complex logic into
smaller, more manageable parts.
4. Window fun
Window functions, also known as analytical functions or windowed aggregates, are a
powerful feature in SQL that allow you to perform calculations across a set of table
rows that are related to the current row. Unlike traditional aggregate functions (e.g., SUM,
AVG, COUNT), window functions do not collapse rows into a single value; instead, they
provide a way to compute values for each row while considering a "window" of rows
related to that row.
5. Over()
6. Partition by
7. Diff b/w SQL & NOSQL
SQL (Structured Query Language) and NoSQL (Not Only SQL) are two distinct categories of database
management systems, each with its own characteristics, use cases, and advantages. Here's a
comparison of SQL and NoSQL databases:
1. Data Model:
- SQL: Relational databases use a structured data model based on tables with rows and columns.
Data is organized into structured schemas with predefined relationships between tables.
- NoSQL: NoSQL databases use various data models such as key-value, document, column-family,
and graph. They provide more flexibility in data storage and retrieval, allowing for schema-less or
dynamic schemas.
2. Schema:
- SQL: Relational databases have a fixed schema that defines the structure of the data in advance.
Changes to the schema can be complex and might require data migration.
- NoSQL: NoSQL databases often have a dynamic or flexible schema, allowing you to change the
structure of data without major disruptions. This is particularly useful in rapidly changing
environments.
3. Query Language:
- SQL: SQL databases use a structured query language (SQL) for querying and manipulating data.
SQL provides powerful querying capabilities for structured data.
- NoSQL: NoSQL databases often use custom query languages or APIs specific to the data model.
Some NoSQL databases might not support complex querying to the same extent as SQL databases.
4. Scalability:
- SQL: Traditional SQL databases are typically designed for vertical scalability (scaling up) by adding
more resources to a single server.
- NoSQL: NoSQL databases are often designed for horizontal scalability (scaling out) across multiple
servers or nodes, making them more suitable for handling large amounts of data and high traffic.
5. Consistency and ACID:
- SQL: Relational databases emphasize strong consistency and adhere to ACID properties
(Atomicity, Consistency, Isolation, Durability) to ensure data integrity.
- NoSQL: NoSQL databases offer various levels of consistency and might relax ACID constraints in
favor of other properties like availability and partition tolerance. This depends on the specific NoSQL
database type.
6. Use Cases:
- SQL: SQL databases are suitable for applications that require structured data with complex
relationships, such as transactional systems, financial applications, and reporting.
- NoSQL: NoSQL databases excel in handling unstructured or semi-structured data, high-velocity
data, and scenarios where data schemas change frequently. They are often used for content
management systems, real-time analytics, IoT applications, and more.
In summary, SQL and NoSQL databases have different strengths and weaknesses, and the choice
between them depends on the specific requirements of your application. SQL databases are well-
suited for structured data and complex querying, while NoSQL databases offer flexibility, scalability,
and the ability to handle diverse data types and use cases.
Jade Global
1.validation mode
2.validation
3. Architecture of snowpipe
4. can we built SCD2 using tasks and streams? - YES
5. Architecture of snowflake
6. scale in and scale out
7. How do you share data across different region?
8. which tables are not cloned in snowflake?
Temporary table and transient table can’t cloned to a permanent table.
7. How did you do data modelling in snowflake?
As of my last knowledge update in September 2021, I don't have personal experiences or
capabilities, but I can provide you with a general overview of how data modeling is typically done in
Snowflake and some common challenges that might be encountered. Keep in mind that best
practices and tools might have evolved since then, so it's always a good idea to consult the latest
documentation and resources.
Data Modeling in Snowflake:
Data modeling in Snowflake follows similar principles to other data warehousing systems. Here are
the general steps and considerations:
1. Requirements Gathering: Understand the business requirements for your data warehouse.
Identify the data sources, data types, relationships, and the types of analysis that will be performed
on the data.
2. Logical Data Model: Create a logical data model that defines the entities, attributes, and
relationships within your data. This model helps to conceptualize the structure of the data and its
relationships before considering the technical implementation.
3. Physical Data Model: Translate the logical data model into a physical data model that fits
Snowflake's architecture. Identify which entities will become tables, the columns they'll contain,
primary keys, foreign keys, and any required indexing.
4. Normalization and Denormalization: Decide on the level of normalization or denormalization that
best suits your use case. Snowflake's columnar storage can handle both normalized and
denormalized structures efficiently, allowing you to optimize for storage and query performance.
5. Data Distribution and Clustering Keys: Determine how data will be distributed across Snowflake's
compute nodes. Choose appropriate distribution keys to optimize query performance. Clustering
keys can also be specified to enhance data organization within each distribution.
6. Data Loading and Transformation: Design ETL processes to load and transform data into
Snowflake tables. Utilize Snowflake's capabilities for bulk loading, copying, and transforming data
during the load process.
7. Security and Access Control: Define security roles, privileges, and access controls for different
users and roles within Snowflake. Consider implementing Snowflake's role-based access control
(RBAC) to ensure data security.
Challenges in Data Modeling:
1. Complexity: Depending on the size and complexity of your data, creating an efficient and effective
data model can be challenging. Balancing the normalization and denormalization decisions requires
careful consideration.
2. Performance Optimization: While Snowflake offers excellent query performance, designing a
model that optimizes query performance for your specific use case can be challenging. Designing the
right distribution and clustering keys is crucial.
3. Changing Business Requirements: As business requirements evolve, your data model might need
to adapt. Ensuring that your data model remains flexible and scalable can be a challenge.
4. Data Governance: Ensuring data consistency, accuracy, and compliance with regulations can be
challenging, especially in a rapidly changing data environment.
5. ETL Complexity: Designing effective ETL processes to transform and load data into Snowflake
might involve complex transformations and orchestrations.
6. Collaboration: In larger organizations, collaboration between different teams, including business
stakeholders, data engineers, and analysts, can be challenging to manage.
7. Performance Tuning: As the volume of data grows, performance tuning becomes essential to
maintain query performance. Monitoring and optimizing query execution plans can be intricate.
Remember that each organization's challenges can vary based on their unique data needs, existing
infrastructure, and skillset. Keeping up with best practices, consulting with experts, and leveraging
Snowflake's documentation and community resources can help navigate these challenges
effectively.
8. How to increase the data consistency of star schema model.
Increasing data consistency in a star schema model involves designing the schema and implementing
practices that ensure the accuracy, integrity, and uniformity of data across the various dimensions
and the fact table. Here are some strategies to enhance data consistency in a star schema model:
1. Data Quality Assessment:
Regularly assess the quality of the source data before it enters the star schema. Implement data
profiling and cleansing processes to identify and correct data anomalies, duplicates, missing values,
and inaccuracies.
2. Standardize Data Values:
Apply data standardization techniques to ensure consistent formats and values across dimensions.
For example, standardize country codes, date formats, and product names to avoid discrepancies.
3. Master Data Management (MDM):
Implement Master Data Management practices to define and maintain consistent reference data
across the dimensions. This helps prevent variations in dimension attributes like customer names,
product codes, and location names.
4. Validating Foreign Keys:
Enforce referential integrity by validating foreign keys that link dimensions to the fact table. This
prevents orphaned or mismatched records and ensures that only valid data is loaded.
5. Data Transformation Rules:
Apply consistent data transformation rules during the ETL process. Ensure that calculations,
aggregations, and transformations are consistent across all data loads.
6. Data Type Consistency:
Maintain consistent data types for attributes across dimensions and the fact table. Inconsistent
data types can lead to errors during query execution.
7. Data Governance:
Implement data governance practices to establish data ownership, accountability, and guidelines
for data usage and updates. This ensures that data changes are controlled and properly
documented.
8. Version Control:
Use version control for your data schema and ETL processes. This ensures that changes to the
schema are tracked, documented, and applied consistently.
9. Automated Testing:
Implement automated data validation and testing as part of your ETL processes. Create validation
scripts that check data integrity, consistency, and accuracy after each data load.
10. Audit Trails:
Maintain audit trails for data changes. Track who made the changes, when they were made, and
the reasons for the changes. This transparency helps maintain data accountability.
11. Documentation:
Maintain detailed documentation for your star schema model, including data definitions,
transformation rules, and business rules. This helps users understand the data's semantics and
reduces the risk of misinterpretation.
12. Regular Monitoring:
Continuously monitor the data for anomalies and inconsistencies. Implement alerting mechanisms
to notify you when data discrepancies are detected.
13. Training and Awareness:
Train users and stakeholders on the importance of data consistency and how to use the star
schema model correctly. Increasing awareness can lead to better practices and data handling.
By implementing these strategies, you can significantly improve data consistency in your star schema
model, ensuring that the data is accurate, reliable, and aligned with your business requirements.
what is data modeling and how to do it in ETL projects?
ChatGPT
Data modeling is a crucial process in the field of data management and analysis. It involves creating
a structured representation of the data that an organization collects, processes, and stores. Data
modeling helps in understanding the relationships between different data elements, defining data
attributes, and designing data structures that support the organization's business needs and
objectives. In the context of ETL (Extract, Transform, Load) projects, data modeling plays a
significant role in ensuring that data is efficiently and accurately extracted, transformed, and loaded
from source systems to target systems.
Here's how to approach data modeling in ETL projects:
1. Understand the Business Requirements: Begin by thoroughly understanding the business
requirements and objectives of the ETL project. This involves collaborating with business
stakeholders to determine what data needs to be extracted, how it should be transformed,
and where it should be loaded.
2. Identify Data Sources and Targets: Identify the data sources (e.g., databases, APIs, flat files)
from which data needs to be extracted and the target systems (e.g., data warehouses, data
lakes, reporting tools) where the transformed data will be loaded.
3. Data Profiling and Analysis: Profile and analyze the data from the source systems to gain
insights into its quality, structure, and relationships. This step helps in understanding data
anomalies, missing values, and data distribution, which can influence the data modeling
process.
4. Choose a Data Modeling Approach: There are different data modeling approaches you can
choose from, including:
● Conceptual Data Modeling: Create a high-level, abstract representation of the data
and its relationships, focusing on business concepts.
● Logical Data Modeling: Define the data structure and relationships in a technology-
independent manner, using entities, attributes, and relationships.
● Physical Data Modeling: Design the actual database schema and storage structures,
considering the specific database technology being used.
5. Create Data Models: Depending on the chosen approach, create conceptual, logical, and/or
physical data models that capture the data requirements, relationships, and constraints.
Tools like ERD (Entity-Relationship Diagrams) or data modeling software can be used for
this purpose.
6. Normalization and Denormalization: In the logical data modeling phase, consider whether
to normalize or denormalize the data structure. Normalization reduces data redundancy but
may complicate ETL processes. Denormalization simplifies ETL but can lead to data
redundancy.
7. Define Transformation Rules: As part of the data modeling process, define transformation
rules that specify how data should be transformed during the ETL process. This includes
data cleansing, validation, aggregation, and enrichment rules.
8. Document Data Models: Properly document the data models, including data dictionaries,
metadata, and any business rules associated with the data. This documentation is essential
for the ETL development team and future data users.
9. Iterate and Review: Data modeling is an iterative process. Review the data models with
business stakeholders and ETL developers to gather feedback and make necessary
adjustments.
10. Implement ETL Processes: Once the data models are finalized, use them as a foundation for
developing the ETL processes. Extract data from source systems, apply the defined
transformations, and load the transformed data into the target systems.
11. Testing and Validation: Thoroughly test the ETL processes to ensure that data is extracted,
transformed, and loaded accurately. Validate the results against the business requirements
and data models.
12. Monitoring and Maintenance: Continuously monitor the ETL processes and data models to
adapt to changing business needs, data quality issues, and technology advancements.
Effective data modeling is crucial for the success of ETL projects, as it provides a clear blueprint for
data integration and ensures that data is consistent, accurate, and aligned with business goals.