Lab report
Course code: CSE326
Course Title: Data Mining and Machine Learning Lab
Lab report: 02
Topic: Data Visualization
Submitted To:
Name: Sadman Sadik Khan
Designation: Lecturer
Department: CSE
Daffodil International University
Submitted By:
Name: Fardus Alam
ID: 222-15-6167
Section: 62-G
Department: CSE
Daffodil International University
Submission Date: 15-03-2025
Code:
1. from google.colab import drive
2. drive.mount('/content/drive')
3.
Explanation:
Connecting google drive with the google colab.
Code:
1. import pandas as pd
2. import numpy as np
3. import matplotlib.pyplot as plt
4. import seaborn as
sb 5.
Explanation:
Importing necessary libraries for data frame handling and visualization.
Code:
1. df = pd.read_csv('/content/drive/MyDrive/lab dataset
data mining/healthcare-dataset-stroke-data2.csv')
2. df.head()
3.
Output:
Explanation:
Loading the csv file from google drive and showing first 5 data point form df data set.
Code:
1. df.info()
2.
Output:
Explanation:
Showing some basic informations like not null count and data type of each column from df data set.
Code:
1. df.describe()
2.
Output:
Explanation:
Showing some statistics of each numerical columns. Statistics – count, mean, std, min, max and quartiles.
Code: Scatter Plot
1. x = df['age']
2. y =
df['bmi'] 3.
4. sns.scatterplot(data= df, x=x, y=y, hue = 'gender')
5. plt.show()
6.
Output:
Explanation:
Seaborn's scatterplot() function is used to create scatter plots, which visualize the relationship between
two numerical variables (age and bmi). It allows customization of colors, sizes, and styles based on
additional categorical variables. I use here hue for gender column that separate male and female as color.
Code: scatter plot using matplotlib
1. plt.scatter(df['age'],df['bmi'], c='g', label='age & bmi')
2. plt.scatter(df['age'],df['avg_glucose_level'],
c='b',label='age & avg_glucose_level')
3.
4. plt.title('Scatter plot using matplotlib with different
color and label')
5. plt.xlabel('age')
6. plt.ylabel('bmi &
avg_glucose_level') 7.
8. plt.legend()
9. plt.show()
10.
Output:
Explanation:
This code creates a scatter plot using Matplotlib to visualize the relationships between age vs. bmi and
age vsavg_glucose_level, with different colors for distinction. It helps compare how age correlates with
both bmi and avg_glucose_level
Code: Barplot
1. plt.title('Barplot between work_type & stroke')
2. sns.barplot(data=df, x='work_type', y='stroke',
hue= 'gender',errorbar=None)
3. plt.show()
4.
Output:
Explanation:
Above code creates a bar plot using Seaborn to compare the relationship between work type and stroke
occurrence, while differentiating by gender.
Code: Pie chart
1. gender = df['gender'].value_counts()
2. colors = ['r', 'g',
'b'] 3.
4. plt.pie(gender, labels=gender.index, autopct='%1.2f%%',
colors=colors, startangle=0, explode=(0.1,0,0,),
wedgeprops={'edgecolor': 'black'})
5.
6. plt.title("Pie Chart")
7. plt.show()
8.
Output:
Explanation:
Create pie chart using matplotlib for gender colums
Key terms:
df['gender'].value_counts(): Gets gender counts dynamically.
colors =['r', 'g', 'b']: Assigns red, green, and blue to slices.
autopct='%1.2f%%': Displays percentages with two decimal places.
explode=(0.1, 0, 0): Slightly separates the first slice for emphasis.
wedgeprops={'edgecolor': 'black'}: Adds black borders for clarity.
startangle=0: Starts the chart from 0 degrees
Code: Histogram
1. sns.histplot(data = df, x= 'work_type', color='g')
2. plt.title('Histogram')
3. plt.show()
4.
Output:
Explanation:
This code creates a histogram using Seaborn to visualize the distribution of the work_type variable in the
DataFrame df.
Key terms:
sns.histplot(): Plots the histogram for the specified variable.
x='work_type': Specifies that the data for the work_type column will be used.
color='g': Sets the color of the bars to green.
plt.title(): Adds the title "Histogram".
plt.show(): Displays the plot.
Code: Box plot
1. sns.boxplot(data= df, x='bmi', hue= 'gender', )
2. plt.title('Box Plot')
3. plt.show()
4.
Output:
Explanation:
Above code creates a box plot using Seaborn to compare the distribution of bmi across different genders
in the DataFrame df.
Key terms:
sns.boxplot(): Plots the box plot.
x='bmi': Plots the bmi values on the x-axis.
hue='gender': Differentiates the data by gender using different colors.
Code: Heatmap
1. new_df =
df[['age','bmi','avg_glucose_level','stroke']] 2.
3. plt.figure(figsize = (10 , 5))
4. sns.heatmap(new_df.corr(), annot = True, linewidths=0.2)
5. plt.title('Heatmap ')
6. plt.show()
7.
Output:
Explanation:
Heatmap to visualize the correlation matrix of selected columns in the new_df DataFrame.
Key Features:
new_df.corr(): Calculates the correlation coefficients between the columns (age, bmi,
avg_glucose_level, stroke).
sns.heatmap(): Plots the heatmap with annotations showing the correlation values.
annot=True: Displays the correlation values inside the heatmap cells.
linewidths=0.2: Adds thin lines between cells for better separation