UNIT-3
EXPERIMENTS – 3(B)
2. Apply different Missing Data handling techniques
a) NaN values in mathematical Operations
b) Filling in missing data
c) Forward and Backward filling of missing values
d) Filling with index values
e) Interpolation of missing values
a) NaN values in mathematical Operations
# Import math Library
import math
# Print the value of nan
print (math.nan)
OUTPUT:
nan
b) Filling in missing data
# Importing pandas and numpy
import pandas as pd
import numpy as np
# Sample DataFrame with missing values
data = {'First Score': [100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score': [np.nan, 40, 80, 98]}
df = pd.DataFrame(data)
# Checking for missing values using isnull()
missing_values = df.isnull()
print(missing_values)
c) Forward and Backward filling of missing values
• Forward Fill: The missing NaN values will be replaced by the last valid
(non-null) value before them.
• Backward Fill: The missing NaN values will be replaced by the next valid
(non-null) value after them.
1. Forward Fill (ffill): NaN values are replaced by the last known value.
2. Backward Fill (bfill): NaN values are replaced by the next known value.
import pandas as pd
# Example DataFrame with missing values (NaN)
data = {
'Date': ['2025-02-20', '2025-02-21', '2025-02-22', '2025-02-23', '2025-02-24'],
'Value': [10, None, None, 20, None]
}
# Create the DataFrame
df = pd.DataFrame(data)
# Show original data
print("Original DataFrame:")
print(df)
# Forward fill missing values (NaN)
df_forward = df.fillna(method='ffill')
# Backward fill missing values (NaN)
df_backward = df.fillna(method='bfill')
# Display the results
print("\nDataFrame after Forward Fill:")
print(df_forward)
print("\nDataFrame after Backward Fill:")
print(df_backward)
OUT PUT:
Original DataFrame:
Date Value
0 2025-02-20 10.0
1 2025-02-21 NaN
2 2025-02-22 NaN
3 2025-02-23 20.0
4 2025-02-24 NaN
DataFrame after Forward Fill:
Date Value
0 2025-02-20 10.0
1 2025-02-21 10.0
2 2025-02-22 10.0
3 2025-02-23 20.0
4 2025-02-24 20.0
DataFrame after Backward Fill:
Date Value
0 2025-02-20 10.0
1 2025-02-21 20.0
2 2025-02-22 20.0
3 2025-02-23 20.0
4 2025-02-24 NaN
D) Filling with index values
# importing pandas as pd
import pandas as pd
# Creating the Index
idx = pd.Index([1, 2, 3, 4, 5, None, 7, 8, 9, None])
# Print the Index
Idx
Use Index.fillna() function to fill all the missing strings in the Index.
# importing pandas as pd
import pandas as pd
# Creating the Index
idx = pd.Index(['Labrador', 'Beagle', None, 'Labrador',
'Lhasa', 'Husky', 'Beagle', None, 'Koala'])
# Print the Index
Idx
D). Interpolation of missing values
Python Pandas interpolate () method is used to fill NaN values in the Data Frame or
Series using various interpolation techniques to fill the missing values rather than
hard-coding the value.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [1, 2, np.nan, 4],
'B': [5, np.nan, np.nan, 8],
'C': [9, 10, 11, 12] })
df.interpolate()
print(df)