Interview Preparation Notes
Regression and Model Assumptions
1. What are the key assumptions of a linear regression model?
2. What is homoscedasticity, and how is it different from heteroscedasticity?
3. What is multicollinearity, and how can it affect model performance?
4. What is the difference between bias and variance?
5. What are overfitting and underfitting? How can they be avoided?
Model Evaluation & Metrics
1. What is R score and what does it indicate?
2. Why is adjusted R used, and how is it different from R?
3. What are precision, recall, F1-score, and how do we control the tradeoff between them?
4. What metrics are used in regression problems (e.g., RMSE, MAE, MAPE)?
5. Why did you use MAPE for time series forecasting instead of R?
Feature Engineering & Selection
1. What is feature engineering?
2. How do you select relevant features for a model?
3. What is VIF (Variance Inflation Factor), and how is it used for detecting multicollinearity?
4. Can VIF be applied to categorical variables?
5. What is SHAP and LIME? How do they help in model interpretation?
Time Series Forecasting
1. What are the components of a time series (trend, seasonality, residual, etc.)?
2. What is the difference between seasonality and cyclicity?
3. What is ARIMA? Explain its components (p, d, q).
4. How do we decide the AR and MA order using ACF and PACF plots?
5. How are ACF and PACF calculated and interpreted?
6. How is stationarity tested in time series data?
7. How do you handle missing values in time series data?
8. How are outliers treated in time series forecasting?
9. What is forecast moderation and disaggregation?
Interview Preparation Notes
10. What is Triple Exponential Smoothing (TES), and how do you tune its parameters?
11. What is Auto_TES and why didn't you use newer models?
Project & Business Context Questions
1. Explain segmentation and its role in forecasting.
2. What is ABC classification, and how is it applied?
3. How do you handle outlier correction in your data pipeline?
4. What KPIs would you track for transaction success/failure?
5. What was the size of the dataset you worked with?
6. Explain your project in detail.
Sampling & Hypothesis Testing
1. What are different types of sampling techniques?
2. If you want to compare a product's sales trend against others, which sampling approach would you use?
3. How do you test if a new model is better than an existing one using hypothesis testing?
4. How do you generate a representative sample for model comparison?
5. What are common significance levels (e.g., 0.01, 0.05, 0.10) and how do you choose one?
A/B Testing & Experimentation
1. If a model (v1) is already deployed and a new model (v2) is developed, how do you compare them using A/B testing
without accuracy metrics?
Python Programming
1. How to apply a discount to specific values in a dictionary?
2. How do you calculate a cumulative sum manually (without using built-in functions)?
3. What does np.zeros(2) return?
4. What happens if a Python function doesn't include a return statement?
5. How do break and continue work in loops?
6. What is the difference between lists and arrays?
7. What is list comprehension in Python?
8. What is left-skewed vs right-skewed data?
Interview Preparation Notes
Python Coding Tasks
1. Write a function to find the longest substring with at most K distinct characters.
2. Write a function to find the longest common prefix in a list like ["flower", "flow", "flight"] -> 'fl'
3. Fix the logic in a function to compute running balance in a DataFrame with credit/debit values.
4. Extract valid email addresses from a list of text strings using a heuristic (e.g., @ in the middle).
5. Calculate a 7-day rolling average of sales using rolling(window=7, min_periods=1).
SQL & Databases
1. What is the difference between WHERE and HAVING clauses in SQL?
2. What is the difference between ALTER and UPDATE statements?
3. SQL query: Given three tables, write a query to find customers who bought "widget_A" at least once.
4. How would you write a query to select students scoring more than 60%?
Probability & Statistics
1. What is the probability of getting at least two consecutive 3s when rolling a die three times?
2. Why don't we normalize data before detecting outliers?
NLP & Deep Learning Concepts
1. What is a Variational Autoencoder (VAE)?
2. What are GPT and BERT architectures? When should each be used?
3. What is an agent in AI systems?
4. What is self-attention and multi-head attention in transformers? Why are they important?