Data Analyst Interview Cheat Sheets
SQL
Q: How do you select specific columns from a table?
A:
```sql
SELECT column1, column2 FROM table_name;
```
Q: How do you filter data in SQL?
A:
```sql
SELECT * FROM table_name WHERE column = 'value';
```
Q: How do you perform aggregation with grouping?
A:
```sql
SELECT column, COUNT(*) FROM table_name GROUP BY column;
```
Q: How do you join two tables?
A:
```sql
SELECT * FROM table1
JOIN table2 ON table1.id = table2.id;
```
Python (Pandas, Numpy, Visualization)
Q: How do you read a CSV file in pandas?
A:
```python
import pandas as pd
df = pd.read_csv('file.csv')
```
Q: How do you get a summary of your DataFrame?
A:
```python
df.info()
df.describe()
```
Q: How do you filter rows based on a condition?
A:
```python
df[df['column'] > 100]
```
Q: How do you visualize data?
A:
```python
import matplotlib.pyplot as plt
import seaborn as sns
sns.histplot(df['column'])
plt.show()
```
Excel
Q: What are some commonly used Excel functions?
A:
- SUM(range)
- AVERAGE(range)
- IF(logical_test, value_if_true, value_if_false)
- VLOOKUP(lookup_value, table_array, col_index_num, [range_lookup])
Q: What is a Pivot Table used for?
A: Summarizing, analyzing, exploring, and presenting data.
Q: Useful shortcuts?
A:
- Ctrl + Shift + L: Toggle filters
- Alt + E + S + V: Paste special
- Ctrl + Arrow Keys: Navigate large data sets
Power BI
Q: How do you load data into Power BI?
A: Home > Get Data > Choose your source > Load
Q: Common DAX functions?
A:
- SUM(), AVERAGE(), COUNTROWS()
- CALCULATE(), FILTER(), RELATED()
Q: What are some key visualizations?
A: Bar chart, Line chart, Slicer, Matrix, Card
Q: How to publish a report?
A: Home > Publish > Select Workspace
Statistics
Q: What is the difference between mean, median, and mode?
A:
- Mean: Average value
- Median: Middle value when sorted
- Mode: Most frequently occurring value
Q: What is a p-value?
A: The probability of obtaining test results at least as extreme as the results actually
observed, under the null hypothesis.
Q: What is correlation?
A: A measure of the relationship between two variables. Ranges from -1 to 1.
Q: What is linear regression?
A: A method to model the relationship between a dependent variable and one or more
independent variables.