SQL
Interview questions
for Data Analysts
Part I
linkedin.com/in/ileonjose
1. Average Review Ratings per
Product per Month
Problem Statement: Given a table of
product reviews, calculate the
average review rating for each
product for every month. The data is
in a reviews table, which includes
review_id, user_id, submit_date,
product_id, and stars. The output
should list the month (as a numerical
value), product_id, and the average
star rating rounded to two decimal
places. Sort the result by month and
then by product_id.
linkedin.com/in/ileonjose
linkedin.com/in/ileonjose
How to Solve:
Extract the month from the
submit_date using the EXTRACT
function.
Group the results by the extracted
month and product_id.
Compute the average star rating for
each group and round the result to
two decimal places.
Order the output by month and
product_id.
linkedin.com/in/ileonjose
linkedin.com/in/ileonjose
2. Optimizing a Slow SQL Query
Problem Statement: Amazon handles
massive datasets, and optimizing SQL
queries is crucial for performance.
Discuss various methods to optimize a
slow SQL query.
linkedin.com/in/ileonjose
How to Solve:
Select Specific Fields: Use SELECT
field1, field2 instead of SELECT * to
retrieve only necessary columns.
Avoid SELECT DISTINCT: Use
DISTINCT only when absolutely
needed, as it can be expensive.
Use INNER JOIN: Prefer INNER JOIN
over using multiple WHERE clauses to
join tables.
linkedin.com/in/ileonjose
Minimize Joins: Where possible,
denormalize the data to reduce the
need for complex joins.
Add Indexes: Create indexes on
columns that are frequently used in
WHERE clauses and joins to speed up
queries.
Examine Execution Plans: Use the SQL
query execution plan to identify
bottlenecks and optimize accordingly.
linkedin.com/in/ileonjose
3. SQL Constraints
Problem Statement: Explain SQL
constraints and provide examples of
different types of constraints used to
enforce data integrity in databases.
linkedin.com/in/ileonjose
How to Solve:
NOT NULL: Ensures that a column
cannot have NULL values.
UNIQUE: Ensures all values in a
column are unique.
INDEX: Improves query performance
by indexing frequently queried
columns.
PRIMARY KEY: Uniquely identifies
each record in a table.
FOREIGN KEY: Ensures referential
integrity between tables.
linkedin.com/in/ileonjose
linkedin.com/in/ileonjose
4. Highest-Grossing Items
Problem Statement: Find the top two
highest-grossing products in each
category for the year 2022 from a
table product_spend. The table
contains category, product, user_id,
spend, and transaction_date. The
output should include the category,
product, and total spend.
linkedin.com/in/ileonjose
linkedin.com/in/ileonjose
How to Solve:
Step 1: Aggregate the total spend by
category and product for 2022.
Step 2: Use a Common Table
Expression (CTE) to rank the products
within each category based on total
spend.
Step 3: Filter the results to include
only the top two products per
category.
linkedin.com/in/ileonjose
linkedin.com/in/ileonjose
5. Difference Between RANK() and
DENSE_RANK()
Problem Statement: Explain the
difference between the RANK() and
DENSE_RANK() functions in SQL.
linkedin.com/in/ileonjose
RANK(): Assigns a unique rank to each
row within a partition of a result set. If
there are ties, the rank values will
have gaps (e.g., if two items are ranked
2, the next rank will be 4).
DENSE_RANK(): Similar to RANK(), but
does not leave gaps between ranks. If
two items are ranked 2, the next rank
will be 3.
linkedin.com/in/ileonjose
Found this helpful? Repost!
linkedin.com/in/ileonjose