1)Question: Create a DataFrame with at least 5 rows and 3 columns, where some cells
contain missing values (NaN). Write a Python script to:
● Detect and print all the missing values in the DataFrame.
● Count the number of missing values in each column.
2)Dropping Missing Values
Question: Given the following DataFrame:
data = {'A': [1, 2, None, 4, 5],
'B': [None, 2, 3, None, 5],
'C': [1, None, 3, 4, None]}
df = pd.DataFrame(data)
a)Drop rows where any column has missing values.
b)Drop columns where all the values are missing.
c)Drop rows with missing values but keep those where at least 3 values are non-missing.
3)Fill the missing values in a DataFrame with the following strategy:
○ For numeric columns, replace missing values with the mean of the column.
○ For categorical columns (strings), replace missing values with the mode of
the column.
Provide an example DataFrame and fill the missing values according to the above
strategies.
4)Question: Given the following DataFrame:
data = {'A': [None, 2, 3, 4, 5],
'B': [1, None, 3, None, 5],
'C': [1, 2, None, 4, None]}
df = pd.DataFrame(data)
● Replace all missing values in column 'A' with 0.
● Replace all missing values in column 'B' with the column's mean.
● Replace missing values in column 'C' with the median.
1. Detecting Missing Values
You can detect missing values using isna() or isnull() (both work the same way) and
notna() or notnull() to check for non-missing values.
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, None, 4],
'B': [None, 2, 3, 4]
})
# Detect missing values
df.isna()
# Detect non-missing values
df.notna()
2. Dropping Missing Values
You can remove rows or columns with missing values using dropna().
● Drop rows with missing values:
df.dropna(axis=0) # Drops rows with NaN
Drop columns with missing values
df.dropna(axis=1) # Drops columns with NaN
You can also specify a threshold, for example, keeping rows with at least 2 non-NaN values:
df.dropna(thresh=2)
3. Filling Missing Values
If you don't want to drop missing values, you can fill them with some value using fillna().
There are various strategies for filling missing data.
● Fill with a specific value (e.g., 0):
df.fillna(0)
Fill with a value per column (e.g., different fill values for each column):
df.fillna({'A': 0, 'B': 5})
4. Interpolating Missing Values
You can use interpolation methods to estimate the missing values. This is useful for
numerical data.
df.interpolate() # Linear interpolation
You can also specify different interpolation methods, like polynomial interpolation:
df.interpolate(method='polynomial', order=2)
Replacing Missing Values with Statistical Measures
You can replace missing values with statistical values like the mean, median, or mode
of the column.
● Fill with the mean:
df.fillna(df.mean())
Fill with the median:
df.fillna(df.median())
Fill with the mode:
df.fillna(df.mode().iloc[0])
Checking for Missing Values After Handling
After you've handled missing data, you can check if any values are still missing using:
df.isna().sum() # Check remaining missing values by column