0% found this document useful (0 votes)

91 views33 pages

PWC Data Analyst Interview

The document outlines methods for estimating smartphone sales and daily revenue from roadside tea stalls in India, using logical steps and assumptions based on population data. It also includes Python programming tasks such as finding unique pairs that sum to a target, checking for palindromes, and explaining deep vs shallow copy. Additionally, it covers SQL queries for cumulative revenue, top products by sales volume, identifying customers with consecutive purchases, and calculating user retention rates.

Uploaded by

ronit.kumar2802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views33 pages

PWC Data Analyst Interview

Uploaded by

ronit.kumar2802

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

PWC DATA ANALYST EXPERIENCE (1-3 yoe)

Guesstimate Questions:
1. Estimate the number of smartphones sold in India annually.

To guesstimate the annual number of smartphones sold in India, we can break the problem
into logical steps using assumptions and available population data. Here's a structured
approach:

Step 1: Population of India

India's population is approximately 1.4 billion people.

Step 2: Target Population (Smartphone Users)

Not everyone in India uses or purchases a smartphone. Let's segment the population:

• Assume 70% of the population is in the age group 15-60, which is the primary
smartphone user base.
1.4billion×70

• Assume 70% of this group can afford a smartphone or actively use one.
980million×70

Step 3: Replacement Cycle and New Users

• On average, a smartphone is replaced every 2-3 years. This implies that

approximately one-third of smartphone users replace their phones annually.
686million÷3≈229million

• Assume an additional 5-10% of the target population (new users) buys a

smartphone each year. Using 10%: 686million×10

Step 4: Total Annual Sales

Adding the replacements and new users:

229million+68.6million≈300million

Final Estimate:

Approximately 300 million smartphones are sold annually in India.

Assumptions Recap:
1. 70% of the population is in the primary age group for smartphone users.

2. 70% of this segment can afford smartphones.

3. Replacement cycle is 2-3 years.

4. Annual new users account for ~10% of the total target population.

Refinement:

This estimate can be refined further with real data on:

• Smartphone penetration rates.

• Market growth rates.

• Urban vs. rural demographics.

2. How would you estimate the daily revenue generated by roadside tea stalls across
India?

To estimate the daily revenue generated by roadside tea stalls across India, let's approach
this systematically using assumptions and logical reasoning:

Step 1: Population of India

India's population is approximately 1.4 billion.

Step 2: Target Population for Tea Consumption

Not everyone consumes tea from roadside stalls. Let’s segment the population:

• Assume 70% of the population (adults and older teens) regularly drink tea.
1.4billion×70

• Out of these, assume 50% of tea drinkers consume tea from roadside stalls (the rest
may make tea at home, go to cafes, or other sources).
980million×50

Step 3: Daily Tea Consumption

• On average, tea drinkers consume 2 cups of tea daily.

• Not all cups are purchased from roadside stalls; assume 1 cup per person per day
is bought from such stalls. 490millioncupsperday490 million cups per
day490millioncupsperday

Step 4: Price of Tea

• The average price of tea at roadside stalls is approximately ₹10 per cup.

Step 5: Daily Revenue

• Multiply the daily consumption by the price per cup:

490millioncups×₹10=₹4.9billion

Final Estimate:

The daily revenue generated by roadside tea stalls across India is approximately ₹4.9
billion.

Assumptions Recap:

1. 70% of the population drinks tea.

2. 50% of tea drinkers buy from roadside stalls.

3. One cup per person is consumed daily at roadside stalls.

4. Average price per cup is ₹10.

Refinement:

To improve this estimate:

• Factor in rural vs. urban consumption patterns (higher urban roadside tea stall
density).

• Adjust for regional variations in tea prices and consumption habits.

• Account for occasional tea drinkers or seasonal demand changes.

Python Questions:
1. Write a Python function to find all unique pairs of integers in a list that sum up to a given
target value.

Find All Unique Pairs That Sum to a Target

def find_pairs(nums, target):

seen = set()

pairs = set()

for num in nums:

complement = target - num

if complement in seen:

pairs.add((min(num, complement), max(num, complement)))

seen.add(num)

return list(pairs)

# Example usage:

nums = [2, 4, 3, 7, 5, 8, -1]

target = 7

print(find_pairs(nums, target)) # Output: [(3, 4), (2, 5)]

2. Given a string, write a function to check if it’s a palindrome, ignoring spaces,

punctuation, and case sensitivity.

Check if a String Is a Palindrome

import string
def is_palindrome(s):

# Remove spaces, punctuation, and convert to lowercase

filtered = ''.join(c for c in s if c.isalnum()).lower()

return filtered == filtered[::-1]

# Example usage:

s = "A man, a plan, a canal, Panama!"

print(is_palindrome(s)) # Output: True

3. Explain the difference between deep copy and shallow copy in Python. When would you
use each?

Deep Copy vs. Shallow Copy

• Shallow Copy:

o Creates a new object but does not create copies of nested objects.

o Changes to mutable objects within the original will reflect in the copied
object.

o Example: Using copy.copy() or the copy() method of a list.

• Deep Copy:

o Creates a new object along with copies of all objects it contains, recursively.

o Changes to the original object do not affect the copied object.

o Example: Using copy.deepcopy().

Example:

import copy

original = [[1, 2], [3, 4]]

shallow = copy.copy(original)
deep = copy.deepcopy(original)

original[0][0] = 99

print(shallow) # Output: [[99, 2], [3, 4]]

print(deep) # Output: [[1, 2], [3, 4]]

Use Cases:

• Use shallow copy when you want to duplicate a structure but allow shared mutable
data.

• Use deep copy when creating a fully independent copy is necessary.

4.What are decorators in Python, and how do they work? Provide an example of a scenario
where a decorator would be useful.

Decorators in Python

Decorators are functions that modify the behavior of other functions or methods. They take
a function as input, add functionality to it, and return it.

Example of a Decorator:

def logger(func):

def wrapper(*args, **kwargs):

print(f"Calling {func.name} with {args} and {kwargs}")

result = func(*args, **kwargs)

print(f"{func.name} returned {result}")

return result

return wrapper

@logger
def add(a, b):

return a + b

# Example usage:

print(add(3, 5))

Output:

csharp

Copy code

Calling add with (3, 5) and {}

add returned 8

When to Use:

Decorators are useful for:

1. Logging: Automatically log function calls.

2. Authentication: Check user permissions before executing a function.

Caching: Store results of expensive computations for reuse .

SQL Questions:
1. Write a query to find the cumulative revenue by month for each product category in
a sales table.

Step 1: Create the Sales Table

CREATE TABLE sales (

id INT AUTO_INCREMENT PRIMARY KEY,

product_category VARCHAR(50),
revenue DECIMAL(10, 2),

sale_date DATE

);

Step 2: Insert Sample Records

INSERT INTO sales (product_category, revenue, sale_date) VALUES

('Electronics', 5000.00, '2024-01-15'),

('Electronics', 7000.00, '2024-02-10'),

('Electronics', 4000.00, '2024-03-05'),

('Clothing', 2000.00, '2024-01-20'),

('Clothing', 3000.00, '2024-02-15'),

('Clothing', 1500.00, '2024-03-01'),

('Groceries', 1000.00, '2024-01-10'),

('Groceries', 1200.00, '2024-02-12'),

('Groceries', 1300.00, '2024-03-08');

Step 3: Write the Query for Cumulative Revenue

SELECT

product_category,

DATE_FORMAT(sale_date, '%Y-%m') AS month,

SUM(revenue) AS monthly_revenue,

SUM(SUM(revenue)) OVER (PARTITION BY product_category ORDER BY

DATE_FORMAT(sale_date, '%Y-%m')) AS cumulative_revenue

FROM

sales

GROUP BY
product_category, DATE_FORMAT(sale_date, '%Y-%m')

ORDER BY

product_category, month;

Explanation:

1. DATE_FORMAT(sale_date, '%Y-%m'): Extracts the year and month from the

sale_date for grouping.

2. SUM(SUM(revenue)) OVER (PARTITION BY product_category ORDER BY

DATE_FORMAT(sale_date, '%Y-%m')): Calculates the cumulative revenue for each
product category by summing the monthly revenues in the specified order.

3. GROUP BY product_category, DATE_FORMAT(sale_date, '%Y-%m'): Groups the

data by product category and month.

Sample Output:

Product_Category Month Monthly_Revenue Cumulative_Revenue

Electronics 2024-01 5000.00 5000.00

Electronics 2024-02 7000.00 12000.00

Electronics 2024-03 4000.00 16000.00

Clothing 2024-01 2000.00 2000.00

Clothing 2024-02 3000.00 5000.00

Clothing 2024-03 1500.00 6500.00

Groceries 2024-01 1000.00 1000.00

Groceries 2024-02 1200.00 2200.00

Groceries 2024-03 1300.00 3500.00

2. How would you retrieve the top 5 products by sales volume, excluding any products that
had zero sales in the past 3 months?

Step 1: Create the Products Table

CREATE TABLE product_sales (

product_id INT AUTO_INCREMENT PRIMARY KEY,

product_name VARCHAR(50),

sales_volume INT,

sale_date DATE

);

Step 2: Insert Sample Records

INSERT INTO product_sales (product_name, sales_volume, sale_date) VALUES

('Product A', 150, '2024-10-01'),

('Product A', 200, '2024-11-01'),

('Product A', 180, '2024-12-01'),

('Product B', 100, '2024-10-01'),

('Product B', 0, '2024-11-01'),

('Product B', 50, '2024-12-01'),

('Product C', 250, '2024-10-15'),

('Product C', 300, '2024-11-15'),

('Product C', 400, '2024-12-15'),

('Product D', 0, '2024-10-10'),

('Product D', 0, '2024-11-10'),

('Product D', 0, '2024-12-10'),

('Product E', 500, '2024-10-05'),

('Product E', 600, '2024-11-05'),

('Product E', 700, '2024-12-05');

Step 3: Write the Query

WITH recent_sales AS (

SELECT

product_name,

SUM(sales_volume) AS total_sales,

MAX(CASE WHEN sale_date >= CURDATE() - INTERVAL 3 MONTH THEN sales_volume

ELSE 0 END) AS recent_sales_flag

FROM

product_sales

WHERE

sale_date >= CURDATE() - INTERVAL 3 MONTH

GROUP BY

product_name

valid_products AS (

SELECT

product_name,

total_sales

FROM

recent_sales

WHERE

recent_sales_flag > 0

SELECT
product_name,

total_sales

FROM

valid_products

ORDER BY

total_sales DESC

LIMIT 5;

Explanation:

1. recent_sales CTE:

o Calculates the total sales for each product.

o Uses CASE to flag whether a product had non-zero sales in the past 3
months.

2. valid_products CTE:

o Filters out products with zero sales in all the past 3 months using
recent_sales_flag > 0.

3. Final Query:

o Retrieves the top 5 products by total sales from valid_products.

o Orders the results in descending order of total sales and limits the output to
the top 5 products.

Expected Output:

Product_Name Total_Sales

Product E 1800

Product C 950

Product A 530
Product_Name Total_Sales

Product B 150

3. Given a table of customer transactions, identify all customers who made purchases in
two or more consecutive months.

To solve this, we'll assume the following table structure for customer transactions:

Step 1: Create the Transactions Table

CREATE TABLE customer_transactions (

transaction_id INT AUTO_INCREMENT PRIMARY KEY,

customer_id INT,

transaction_date DATE,

amount DECIMAL(10, 2)

);

Step 2: Insert Sample Records

INSERT INTO customer_transactions (customer_id, transaction_date, amount) VALUES

(1, '2024-01-15', 100.00),

(1, '2024-02-10', 200.00),

(1, '2024-04-05', 150.00),

(2, '2024-01-20', 300.00),

(2, '2024-02-15', 400.00),

(2, '2024-03-01', 500.00),

(3, '2024-01-25', 250.00),

(3, '2024-03-10', 300.00),

(4, '2024-02-05', 150.00),

(4, '2024-03-07', 200.00),

(4, '2024-04-15', 250.00);

Step 3: Write the Query

WITH monthly_transactions AS (

SELECT

customer_id,

DATE_FORMAT(transaction_date, '%Y-%m') AS transaction_month

FROM

customer_transactions

GROUP BY

customer_id, DATE_FORMAT(transaction_date, '%Y-%m')

consecutive_months AS (

SELECT

t1.customer_id,

t1.transaction_month AS month1,

t2.transaction_month AS month2

FROM

monthly_transactions t1

JOIN

monthly_transactions t2

t1.customer_id = t2.customer_id

AND DATE_ADD(LAST_DAY(t1.transaction_month), INTERVAL 1 DAY) =

DATE(t2.transaction_month)

)
SELECT DISTINCT

customer_id

FROM

consecutive_months;

Explanation:

1. monthly_transactions CTE:

o Groups transactions by customer and month using

DATE_FORMAT(transaction_date, '%Y-%m').

o Ensures we have a unique list of months in which a customer made

purchases.

2. consecutive_months CTE:

o Joins monthly_transactions with itself to find customers with consecutive

months.

o Uses DATE_ADD(LAST_DAY(t1.transaction_month), INTERVAL 1 DAY) to

calculate the first day of the next month and checks if it matches
t2.transaction_month.

3. Final Query:

o Selects unique customer IDs from the consecutive_months CTE.

Sample Output:

Customer_ID

Notes:
• Customer 1: Purchased in January and February.

• Customer 2: Purchased in January, February, and March.

• Customer 4: Purchased in February, March, and April.

• Customer 3: Skipped February, so they are not included in the output.

4. Write a query to calculate the retention rate of users on a monthly basis.

Retention Rate Definition

The retention rate is the percentage of users who return in a subsequent month after their
initial activity.

Assumptions

• We have a table called user_activity with the following structure:

o user_id: Unique identifier for each user.

o activity_date: Date of the user's activity.

Step 1: Create the Table

CREATE TABLE user_activity (

user_id INT,

activity_date DATE

);

Step 2: Insert Sample Records

INSERT INTO user_activity (user_id, activity_date) VALUES

(1, '2024-01-15'),

(1, '2024-02-10'),

(1, '2024-03-20'),
(2, '2024-01-20'),

(2, '2024-02-15'),

(3, '2024-02-05'),

(3, '2024-03-10'),

(4, '2024-01-25'),

(5, '2024-02-18'),

(5, '2024-03-15'),

(6, '2024-03-01');

Step 3: Query to Calculate Retention Rate

WITH first_month_activity AS (

SELECT

user_id,

DATE_FORMAT(MIN(activity_date), '%Y-%m') AS first_active_month

FROM

user_activity

GROUP BY

user_id

monthly_retention AS (

SELECT

fma.first_active_month,

DATE_FORMAT(ua.activity_date, '%Y-%m') AS active_month,

COUNT(DISTINCT ua.user_id) AS retained_users

FROM

user_activity ua
JOIN

first_month_activity fma

ua.user_id = fma.user_id

GROUP BY

fma.first_active_month, DATE_FORMAT(ua.activity_date, '%Y-%m')

monthly_cohort AS (

SELECT

first_active_month,

COUNT(DISTINCT user_id) AS cohort_size

FROM

first_month_activity

GROUP BY

first_active_month

SELECT

mr.first_active_month,

mr.active_month,

mr.retained_users,

mc.cohort_size,

ROUND((mr.retained_users / mc.cohort_size) * 100, 2) AS retention_rate

FROM

monthly_retention mr

JOIN

monthly_cohort mc
ON

mr.first_active_month = mc.first_active_month

ORDER BY

mr.first_active_month, mr.active_month;

Explanation

1. first_month_activity CTE:

o Determines the first active month for each user.

2. monthly_retention CTE:

o Counts the number of users retained for each combination of their first
active month and subsequent activity months.

3. monthly_cohort CTE:

o Calculates the size of the cohort for each first active month (the total number
of users who first became active in that month).

4. Final Query:

o Joins monthly_retention and monthly_cohort to calculate the retention rate

as: Retention Rate=(Retained UsersCohort Size)×100\text{Retention Rate} =
\left(\frac{\text{Retained Users}}{\text{Cohort Size}}\right) \times
100Retention Rate=(Cohort SizeRetained Users)×100

o Orders the results by the first active month and the active month.

Sample Output

First_Active_Month Active_Month Retained_Users Cohort_Size Retention_Rate

2024-01 2024-01 3 3 100.00

2024-01 2024-02 2 3 66.67

2024-01 2024-03 1 3 33.33

First_Active_Month Active_Month Retained_Users Cohort_Size Retention_Rate

2024-02 2024-02 3 3 100.00

2024-02 2024-03 2 3 66.67

2024-03 2024-03 1 1 100.00

Interpretation

• First Active Month: The cohort of users who became active in that month.

• Active Month: The months in which users returned.

• Retention Rate: The percentage of the cohort that returned in subsequent months.

5. Find the nth highest salary from an employee table, where n is a parameter passed
dynamically to the query.

To find the nth highest salary dynamically, we can use a subquery with the LIMIT clause.
The query involves ranking salaries in descending order, skipping the first n−1salaries, and
then retrieving the nth salary. Here's how:

Table Creation and Sample Data

CREATE TABLE employees (

employee_id INT PRIMARY KEY,

employee_name VARCHAR(50),

salary DECIMAL(10, 2)

);

INSERT INTO employees (employee_id, employee_name, salary) VALUES

(1, 'Alice', 60000.00),

(2, 'Bob', 75000.00),

(3, 'Charlie', 85000.00),

(4, 'David', 50000.00),

(5, 'Eve', 85000.00);

Query for nth Highest Salary

SET @n := 2; -- Set the value of n dynamically

SELECT DISTINCT salary

FROM employees

ORDER BY salary DESC

LIMIT @n - 1, 1;

Explanation

1. @n Variable:

o Dynamically sets the rank nnn for the desired salary.

2. DISTINCT salary:

o Ensures unique salaries are considered in case of duplicates.

3. ORDER BY salary DESC:

o Orders salaries in descending order, ranking the highest salary first.

4. LIMIT @n - 1, 1:

o Skips the top n−1n-1n−1 salaries and retrieves the next one.

Alternative Query Using Window Functions (MySQL 8.0+)

If the database supports window functions, you can use the DENSE_RANK() function:

SET @n := 2; -- Set the value of n dynamically

WITH ranked_salaries AS (

SELECT

salary,

DENSE_RANK() OVER (ORDER BY salary DESC) AS rank

FROM

employees

SELECT salary

FROM ranked_salaries

WHERE rank = @n;

Explanation (Window Functions)

1. DENSE_RANK():

o Assigns a unique rank to each salary in descending order. Duplicate salaries

get the same rank.

2. WITH ranked_salaries:

o Creates a temporary table with salaries and their respective ranks.

3. WHERE rank = @n:

o Filters the result to return only the nth rank.

Sample Output

For n=2n = 2n=2:

Salary

75000.00

Key Notes
• Use the DISTINCT keyword to handle duplicate salaries for the LIMIT method.

• Use DENSE_RANK() if you want to consider duplicate salaries as a single rank.

6. Explain how indexing works in SQL and how to decide which columns should be indexed
for optimal performance.

How Indexing Works in SQL

An index is a database structure that improves the speed of data retrieval operations on a
table. It works like an optimized lookup table for the database, allowing it to quickly locate
rows without scanning the entire table.

• Structure: Most indexes are implemented as balanced tree structures (e.g., B-trees)
or hash tables. These structures allow efficient searching, insertion, and deletion
operations.

• Function: When a query is executed, the database engine checks if an index is

available for the columns involved in the query’s filters or joins. If so, the engine
uses the index to locate the rows, reducing the need for a full table scan.

Types of Indexes

1. Primary Index:

o Automatically created for the primary key column.

o Ensures unique values and quick lookups for primary key operations.

2. Unique Index:

o Ensures that all values in the indexed column are unique.

3. Clustered Index:

o Reorders the physical storage of table data to match the index order.

o A table can have only one clustered index.

4. Non-clustered Index:

o Creates a separate structure to store the index and points to the table rows.

o A table can have multiple non-clustered indexes.

5. Composite Index:

o Indexes multiple columns together.

6. Full-Text Index:

o Optimized for searching text data, such as finding words or phrases in large
text fields.

Benefits of Indexing

• Faster Query Execution: Speeds up SELECT, JOIN, and WHERE clause operations.

• Reduced I/O Operations: Fewer rows are read from the disk.

• Sorted Data Retrieval: Helps with ORDER BY and GROUP BY clauses.

Drawbacks of Indexing

• Slower Write Operations: INSERT, UPDATE, and DELETE operations become slower
because the index must also be updated.

• Storage Overhead: Indexes consume additional disk space.

• Maintenance Overhead: Indexes need to be maintained, especially in tables with

frequent data modifications.

How to Decide Which Columns to Index

1. Frequently Queried Columns:

o Index columns that appear frequently in WHERE, JOIN, ON, ORDER BY, or
GROUP BY clauses.

2. Primary Keys and Unique Constraints:

o Always index primary key columns as they uniquely identify rows.

3. Foreign Keys:

o Index foreign key columns to improve JOIN performance.

4. High-Selectivity Columns:
o Choose columns with a wide range of unique values (e.g., a user_id column)
because indexes work best with high selectivity.

5. Composite Indexes:

o Use composite indexes when multiple columns are often queried together.
For example, for queries like:

SELECT * FROM sales WHERE year = 2023 AND region = 'North';

A composite index on (year, region) will perform better than individual indexes.

6. Avoid Low-Selectivity Columns:

o Avoid indexing columns with few distinct values (e.g., gender or status with
values like 'Active' or 'Inactive').

7. Read-Heavy Tables:

o Index columns in tables where SELECT operations are more frequent than
INSERT/UPDATE/DELETE.

Examples

Scenario 1: Searching by email in a user table

CREATE INDEX idx_email ON users(email);

• Improves performance for queries like:

SELECT * FROM users WHERE email = '[email protected]';

Scenario 2: Composite index for a sales table

CREATE INDEX idx_year_region ON sales(year, region);

• Optimizes queries with:

SELECT * FROM sales WHERE year = 2023 AND region = 'North';

Scenario 3: Indexing a foreign key

CREATE INDEX idx_customer_id ON orders(customer_id);

• Speeds up JOINs like:

SELECT * FROM orders o JOIN customers c ON o.customer_id = c.customer_id;

Monitoring and Tuning

1. EXPLAIN Plan:

o Use EXPLAIN to analyze how the database executes a query and whether it
uses an index.

2. Query Performance Metrics:

o Monitor slow queries and identify columns for potential indexing.

3. Index Maintenance:

o Periodically rebuild or reorganize indexes to ensure they remain efficient.

Summary

• Use indexes on frequently queried, high-selectivity columns.

• Avoid excessive indexing on write-heavy tables.

• Analyze query patterns and use tools like EXPLAIN to make data-driven decisions
about indexing.

7. Describe the differences between LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN and
when to use each one in a complex query.

Differences Between LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN

In SQL, JOIN operations combine rows from two or more tables based on a related column.
The differences among LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN lie in how
unmatched rows are handled.

1. LEFT JOIN

• Definition: Returns all rows from the left table and the matched rows from the right
table. If no match is found, the result contains NULL for columns from the right
table.

• Use Case: Use when you want all records from the left table regardless of whether
there is a match in the right table.
Syntax

SELECT columns

FROM table1

LEFT JOIN table2

ON table1.common_column = table2.common_column;

Example

• Tables:

o Customers:

CustomerID Name

1 Alice

2 Bob

3 Charlie

o Orders:

OrderID CustomerID

101 1

102 2

• Query:

SELECT c.Name, o.OrderID

FROM Customers c

LEFT JOIN Orders o

ON c.CustomerID = o.CustomerID;

• Result:

Name OrderID

Alice 101
Name OrderID

Bob 102

Charlie NULL

2. RIGHT JOIN

• Definition: Returns all rows from the right table and the matched rows from the left
table. If no match is found, the result contains NULL for columns from the left table.

• Use Case: Use when you want all records from the right table regardless of whether
there is a match in the left table.

Syntax

SELECT columns

FROM table1

RIGHT JOIN table2

ON table1.common_column = table2.common_column;

Example

• Query:

SELECT c.Name, o.OrderID

FROM Customers c

RIGHT JOIN Orders o

ON c.CustomerID = o.CustomerID;

• Result:

Name OrderID

Alice 101

Bob 102
3. FULL OUTER JOIN

• Definition: Combines the results of LEFT JOIN and RIGHT JOIN. Returns all rows
from both tables, with NULL in columns where no match exists.

• Use Case: Use when you want to include all records from both tables, showing
unmatched rows with NULL values.

Syntax

SELECT columns

FROM table1

FULL OUTER JOIN table2

ON table1.common_column = table2.common_column;

Example

• Query:

SELECT c.Name, o.OrderID

FROM Customers c

FULL OUTER JOIN Orders o

ON c.CustomerID = o.CustomerID;

• Result:

Name OrderID

Alice 101

Bob 102

Charlie NULL

When to Use Each Join in Complex Queries

1. LEFT JOIN:

o When the left table contains a primary set of data and you want to include all
rows, even if they have no matching data in the right table.
o Example: Listing all customers, including those who haven't made any
orders.

2. RIGHT JOIN:

o When the right table contains a primary set of data and you want to include
all rows, even if they have no matching data in the left table.

o Example: Listing all orders, including those made by unregistered

customers.

3. FULL OUTER JOIN:

o When both tables are equally important, and you want to analyze all data
points, even unmatched rows.

o Example: Creating a comprehensive report that includes all customers and

all orders, showing unmatched customers or orders.

Key Differences in a Nutshell

Feature LEFT JOIN RIGHT JOIN FULL OUTER JOIN

Rows from Left Table Always Included Only if Matched Always Included

Rows from Right

Only if Matched Always Included Always Included
Table

NULL in Right NULL in Left NULL in Both

Unmatched Rows
Columns Columns Columns

Visual Representation

If A represents rows from the left table and B represents rows from the right table:

• LEFT JOIN: A∪(A∩B)A \cup (A \cap B)A∪(A∩B)

• RIGHT JOIN: B∪(A∩B)B \cup (A \cap B)B∪(A∩B)

• FULL OUTER JOIN: A∪BA \cup BA∪B

Performance Tips

• Use LEFT JOIN or RIGHT JOIN instead of FULL OUTER JOIN if you only need one
side's unmatched rows, as it reduces computation.

• Always use indexes on the columns used in the ON clause to improve performance
in joins.

8. What is the difference between HAVING and WHERE clauses in SQL, and when would
you use each?

Difference Between HAVING and WHERE Clauses in SQL

1. WHERE Clause

• Purpose: Filters rows before any aggregation.

• Scope: Applied before any GROUP BY operation.

• Use Case: Used to filter rows based on conditions applied to individual columns.

Syntax:

SELECT column1, column2, ...

FROM table_name

WHERE condition;

Example:

SELECT customer_name, total_orders

FROM orders

WHERE total_orders > 50;

• Explanation: Filters out rows before aggregation (i.e., filters orders with total_orders
> 50).

2. HAVING Clause

• Purpose: Filters the aggregated results (after applying GROUP BY).

• Scope: Applied after the GROUP BY operation.

• Use Case: Used to filter aggregated data based on conditions applied to aggregate
functions like SUM, AVG, COUNT, etc.

Syntax:

SELECT column1, aggregate_function(column2)

FROM table_name

GROUP BY column1

HAVING condition;

Example:

SELECT category, COUNT(order_id) AS total_orders

FROM orders

GROUP BY category

HAVING total_orders > 10;

• Explanation: Filters the aggregated results (only categories with total_orders > 10
are included).

Key Differences

Feature WHERE Clause HAVING Clause

Purpose Filters rows before aggregation Filters aggregated results

Scope Applied to the table rows individually Applied to the grouped results

Usage Used to filter individual rows Used to filter aggregated results

Can apply conditions to non- Can apply conditions to aggregated

Conditions
aggregated columns (before grouping) columns (after grouping)

Example WHERE total_orders > 50 HAVING COUNT(order_id) > 10

When to Use Each

1. Use WHERE when:

o You need to filter rows based on conditions before performing any
aggregation.

o Example: Filtering customer records where the order count is more than 50.

2. Use HAVING when:

o You need to filter the results of an aggregation.

o Example: Counting orders by category and filtering categories with more than
10 orders.

Practical Scenario

-- Example using both WHERE and HAVING

SELECT category, COUNT(order_id) AS total_orders

FROM orders

WHERE order_date >= '2024-01-01' -- Filtering based on date before aggregation

GROUP BY category

HAVING total_orders > 10; -- Filtering aggregated results

This will show categories with more than 10 orders placed after January 1, 2024.

Walmart Data Analyst Interview Experience
No ratings yet
Walmart Data Analyst Interview Experience
10 pages
Amazon Data Analyst Interview Questions - 1
No ratings yet
Amazon Data Analyst Interview Questions - 1
22 pages
Ade 1737191501
No ratings yet
Ade 1737191501
29 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
Debenhams Summer Sale QT
No ratings yet
Debenhams Summer Sale QT
18 pages
Flipkart Analyst Interview Insights
No ratings yet
Flipkart Analyst Interview Insights
16 pages
Python Project
No ratings yet
Python Project
20 pages
Amazon Sales Analysis
No ratings yet
Amazon Sales Analysis
51 pages
Pandas for Data Analysts
100% (1)
Pandas for Data Analysts
64 pages
Blinkit & Zepto Interview Questions
No ratings yet
Blinkit & Zepto Interview Questions
21 pages
Pandas Guide
No ratings yet
Pandas Guide
65 pages
Pandas Data Processing Guide
No ratings yet
Pandas Data Processing Guide
65 pages
12 CS Set A Anskey
No ratings yet
12 CS Set A Anskey
16 pages
Pandasguide
No ratings yet
Pandasguide
65 pages
Assgn
No ratings yet
Assgn
6 pages
Pandas and Numpy Data Processing Guide
No ratings yet
Pandas and Numpy Data Processing Guide
66 pages
DMV Lab 12
No ratings yet
DMV Lab 12
8 pages
SQL and PySpark Interview Questions
No ratings yet
SQL and PySpark Interview Questions
15 pages
DB Scenario
No ratings yet
DB Scenario
2 pages
MATODA Raport Store20
No ratings yet
MATODA Raport Store20
13 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
SQL Practice Statements
No ratings yet
SQL Practice Statements
3 pages
Answer Key For Pb-Ii
No ratings yet
Answer Key For Pb-Ii
12 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
SQL Interview Questions For A Data Engineer
No ratings yet
SQL Interview Questions For A Data Engineer
11 pages
SQL Tutorial1
No ratings yet
SQL Tutorial1
25 pages
Flipkart Data Analyst Interview Questions 1747625566
No ratings yet
Flipkart Data Analyst Interview Questions 1747625566
27 pages
Retail Analysis With Walmart Data
100% (10)
Retail Analysis With Walmart Data
2 pages
SQL Capstone Project
No ratings yet
SQL Capstone Project
4 pages
Project Descriptioin
No ratings yet
Project Descriptioin
5 pages
SQL 1729830819
No ratings yet
SQL 1729830819
10 pages
10 - Jayesh - Prakash - Rane
No ratings yet
10 - Jayesh - Prakash - Rane
26 pages
NTU AB0403 Quiz Notes
No ratings yet
NTU AB0403 Quiz Notes
18 pages
Anushi Project-House Price Prediction
100% (2)
Anushi Project-House Price Prediction
26 pages
Practical 12
No ratings yet
Practical 12
6 pages
Panda
No ratings yet
Panda
39 pages
Tech Interview Prep: Python & SQL
No ratings yet
Tech Interview Prep: Python & SQL
15 pages
3mark QP MS
No ratings yet
3mark QP MS
8 pages
SQL Project - Exploring Trends, Segmentation & KPIs
No ratings yet
SQL Project - Exploring Trends, Segmentation & KPIs
43 pages
Wholesale Customer Data Analysis
100% (1)
Wholesale Customer Data Analysis
56 pages
DS B17 C3 CaseStudy ShyamDalsaniya IrannaChatti
No ratings yet
DS B17 C3 CaseStudy ShyamDalsaniya IrannaChatti
20 pages
Unibud
No ratings yet
Unibud
38 pages
7 Eleven Technical Interview Answers
No ratings yet
7 Eleven Technical Interview Answers
4 pages
Unibud
No ratings yet
Unibud
17 pages
Unibud
No ratings yet
Unibud
43 pages
SQL Queries for Data Analysis
No ratings yet
SQL Queries for Data Analysis
23 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Training
No ratings yet
Training
17 pages
EY & Zepto Data Analyst Interview Questions
No ratings yet
EY & Zepto Data Analyst Interview Questions
24 pages
Python CheatSheet
No ratings yet
Python CheatSheet
2 pages
Window Functions
No ratings yet
Window Functions
14 pages
Programming Notes 3
No ratings yet
Programming Notes 3
3 pages
Data Retrieval & Cleaning Guide
No ratings yet
Data Retrieval & Cleaning Guide
35 pages
Python for Business Analysts
No ratings yet
Python for Business Analysts
21 pages
DAX in Power BI
100% (6)
DAX in Power BI
51 pages
POWER BI Tutorial
91% (11)
POWER BI Tutorial
77 pages
Data Visualization Using PowerBI
50% (4)
Data Visualization Using PowerBI
32 pages
Power Bi Interview Question AND ANSWER
88% (8)
Power Bi Interview Question AND ANSWER
36 pages
Power BI Notes
100% (12)
Power BI Notes
104 pages
Power BI DAX Simplified B099SBN1XP
94% (16)
Power BI DAX Simplified B099SBN1XP
542 pages
PowerQuery PowerPivot DAX
100% (5)
PowerQuery PowerPivot DAX
114 pages
Interview Questions 61pages
100% (2)
Interview Questions 61pages
61 pages
Power BI Tutorial
93% (14)
Power BI Tutorial
34 pages
Learn Power BI Step by Step Guide To Building Your Own Reports (CDO Advisors Book 1)
100% (6)
Learn Power BI Step by Step Guide To Building Your Own Reports (CDO Advisors Book 1)
87 pages
Mastering DAX Exercises
100% (4)
Mastering DAX Exercises
129 pages
Power BI Guide for Business Analysts
100% (2)
Power BI Guide for Business Analysts
21 pages
DAX Formula Reference Handbook
100% (7)
DAX Formula Reference Handbook
61 pages
PL-300 Exam Dumps Final
100% (11)
PL-300 Exam Dumps Final
260 pages
Advanced DAX For Business Intelligence
89% (9)
Advanced DAX For Business Intelligence
178 pages
Microsoft Power BI Cookbook by Greg Deckler
100% (20)
Microsoft Power BI Cookbook by Greg Deckler
655 pages
Power BI Essentials for Business Users
100% (5)
Power BI Essentials for Business Users
265 pages
Top 50 Power BI Interview Q&A
100% (2)
Top 50 Power BI Interview Q&A
16 pages
Introduction to Data Visualization
100% (11)
Introduction to Data Visualization
28 pages
Power BI 500+ Interveiw Question (Basic To Advance Level) - CertyIQ
67% (3)
Power BI 500+ Interveiw Question (Basic To Advance Level) - CertyIQ
67 pages
Power BI Bible
100% (9)
Power BI Bible
396 pages
Learn Excel Data Analysis
100% (17)
Learn Excel Data Analysis
721 pages
Introduction To Microsoft Power Bi
100% (10)
Introduction To Microsoft Power Bi
127 pages
Applied Microsoft Power BI Bring Your Data To Life
100% (14)
Applied Microsoft Power BI Bring Your Data To Life
592 pages
Learn DAX: Comprehensive Guide
100% (10)
Learn DAX: Comprehensive Guide
179 pages
Learn Excel Dashboard
100% (16)
Learn Excel Dashboard
233 pages
From 0 To DAX
100% (5)
From 0 To DAX
132 pages
PDF Power BI Cheat Sheet12
100% (1)
PDF Power BI Cheat Sheet12
2 pages
MGH Data Analysis With Microsoft Power BI 126045861X
93% (14)
MGH Data Analysis With Microsoft Power BI 126045861X
808 pages
Power BI Data Analysis and Visualization
100% (10)
Power BI Data Analysis and Visualization
289 pages
The Genius Guide To - Divine Archetypes
100% (1)
The Genius Guide To - Divine Archetypes
18 pages
Physics1 PDF
No ratings yet
Physics1 PDF
7 pages
The Empathetic School
100% (1)
The Empathetic School
9 pages
Steel Welded Fabric List Price (SG) - V2.00
No ratings yet
Steel Welded Fabric List Price (SG) - V2.00
2 pages
2022 Article 3361
No ratings yet
2022 Article 3361
18 pages
R3 - To Build A Fire
100% (1)
R3 - To Build A Fire
20 pages
1.introduction To Surveying
No ratings yet
1.introduction To Surveying
10 pages
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
No ratings yet
Region Religion and Politics 100 Years of Shiromani Alcali Dal Amarjit S Narang Download
64 pages
Images Line Drawings and Backplanes
No ratings yet
Images Line Drawings and Backplanes
27 pages
Vipin Kumar Resume
No ratings yet
Vipin Kumar Resume
1 page
For Green Marketing Project
No ratings yet
For Green Marketing Project
16 pages
Ecosystem Services: Economics and Policy Stephen Muddiman Instant Download
No ratings yet
Ecosystem Services: Economics and Policy Stephen Muddiman Instant Download
62 pages
Vacuum Test Procedure (VCP)
No ratings yet
Vacuum Test Procedure (VCP)
5 pages
Hoc Sinh Gioi 8 - 2022
No ratings yet
Hoc Sinh Gioi 8 - 2022
10 pages
Factors Led To The Growth of MIS
No ratings yet
Factors Led To The Growth of MIS
17 pages
CCC Professional Cloud Security Manager
No ratings yet
CCC Professional Cloud Security Manager
32 pages
Agriengineering 06 00187
No ratings yet
Agriengineering 06 00187
18 pages
Power Screw Lift Mechanism Design
No ratings yet
Power Screw Lift Mechanism Design
5 pages
Unit 1
No ratings yet
Unit 1
10 pages
Abs Paris
No ratings yet
Abs Paris
2 pages
Latin American Veggie Meal Plan
No ratings yet
Latin American Veggie Meal Plan
2 pages
Resume Vijayanand Maindkar
No ratings yet
Resume Vijayanand Maindkar
3 pages
17 Managerial Roles
No ratings yet
17 Managerial Roles
4 pages
Semi Detailed Lesson Plan
No ratings yet
Semi Detailed Lesson Plan
4 pages
(Hooker and Monas, 2008) Shoestring Venture - The Startup Bible
No ratings yet
(Hooker and Monas, 2008) Shoestring Venture - The Startup Bible
532 pages
XI - BST - 3 - Private, Public and Global Enterprises
No ratings yet
XI - BST - 3 - Private, Public and Global Enterprises
3 pages
Turbo Machinery Exam Results 2019
No ratings yet
Turbo Machinery Exam Results 2019
3 pages
Steel Squares: Specifications
No ratings yet
Steel Squares: Specifications
1 page
Drugs
No ratings yet
Drugs
22 pages
Writing Section PYQs
No ratings yet
Writing Section PYQs
28 pages

PWC Data Analyst Interview

Uploaded by

PWC Data Analyst Interview

Uploaded by

PWC DATA ANALYST EXPERIENCE (1-3 yoe)

Step 1: Population of India

India's population is approximately 1.4 billion people.

Step 2: Target Population (Smartphone Users)

Step 3: Replacement Cycle and New Users

• On average, a smartphone is replaced every 2-3 years. This implies that

• Assume an additional 5-10% of the target population (new users) buys a

Step 4: Total Annual Sales

Adding the replacements and new users:

Approximately 300 million smartphones are sold annually in India.

2. 70% of this segment can afford smartphones.

3. Replacement cycle is 2-3 years.

This estimate can be refined further with real data on:

• Smartphone penetration rates.

• Market growth rates.

• Urban vs. rural demographics.

Step 1: Population of India

India's population is approximately 1.4 billion.

Step 2: Target Population for Tea Consumption

Step 3: Daily Tea Consumption

Step 4: Price of Tea

Step 5: Daily Revenue

• Multiply the daily consumption by the price per cup:

1. 70% of the population drinks tea.

2. 50% of tea drinkers buy from roadside stalls.

3. One cup per person is consumed daily at roadside stalls.

4. Average price per cup is ₹10.

To improve this estimate:

• Adjust for regional variations in tea prices and consumption habits.

• Account for occasional tea drinkers or seasonal demand changes.

Find All Unique Pairs That Sum to a Target

def find_pairs(nums, target):

for num in nums:

complement = target - num

pairs.add((min(num, complement), max(num, complement)))

nums = [2, 4, 3, 7, 5, 8, -1]

print(find_pairs(nums, target)) # Output: [(3, 4), (2, 5)]

2. Given a string, write a function to check if it’s a palindrome, ignoring spaces,

Check if a String Is a Palindrome

# Remove spaces, punctuation, and convert to lowercase

filtered = ''.join(c for c in s if c.isalnum()).lower()

return filtered == filtered[::-1]

s = "A man, a plan, a canal, Panama!"

print(is_palindrome(s)) # Output: True

Deep Copy vs. Shallow Copy

o Example: Using copy.copy() or the copy() method of a list.

o Changes to the original object do not affect the copied object.

o Example: Using copy.deepcopy().

original = [[1, 2], [3, 4]]

print(shallow) # Output: [[99, 2], [3, 4]]

print(deep) # Output: [[1, 2], [3, 4]]

• Use deep copy when creating a fully independent copy is necessary.

def wrapper(*args, **kwargs):

print(f"Calling {func.__name__} with {args} and {kwargs}")

result = func(*args, **kwargs)

print(f"{func.__name__} returned {result}")

Calling add with (3, 5) and {}

Decorators are useful for:

1. Logging: Automatically log function calls.

2. Authentication: Check user permissions before executing a function.

Caching: Store results of expensive computations for reuse .

Step 1: Create the Sales Table

CREATE TABLE sales (

id INT AUTO_INCREMENT PRIMARY KEY,

Step 2: Insert Sample Records

INSERT INTO sales (product_category, revenue, sale_date) VALUES

('Electronics', 5000.00, '2024-01-15'),

('Electronics', 7000.00, '2024-02-10'),

('Electronics', 4000.00, '2024-03-05'),

('Clothing', 2000.00, '2024-01-20'),

('Clothing', 3000.00, '2024-02-15'),

('Clothing', 1500.00, '2024-03-01'),

('Groceries', 1000.00, '2024-01-10'),

('Groceries', 1200.00, '2024-02-12'),

('Groceries', 1300.00, '2024-03-08');

Step 3: Write the Query for Cumulative Revenue

print(f"Calling {func.name} with {args} and {kwargs}")

print(f"{func.name} returned {result}")