Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views11 pages

DMML Lab Report 02

This lab report details the process of data visualization using Python libraries in a Data Mining and Machine Learning course. It includes code snippets for connecting to Google Drive, importing libraries, loading datasets, and creating various visualizations such as scatter plots, bar plots, pie charts, histograms, box plots, and heatmaps. The report is submitted by Fardus Alam and reviewed by Sadman Sadik Khan at Daffodil International University.

Uploaded by

Atick Arman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views11 pages

DMML Lab Report 02

This lab report details the process of data visualization using Python libraries in a Data Mining and Machine Learning course. It includes code snippets for connecting to Google Drive, importing libraries, loading datasets, and creating various visualizations such as scatter plots, bar plots, pie charts, histograms, box plots, and heatmaps. The report is submitted by Fardus Alam and reviewed by Sadman Sadik Khan at Daffodil International University.

Uploaded by

Atick Arman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Lab report

Course code: CSE326


Course Title: Data Mining and Machine Learning Lab
Lab report: 02
Topic: Data Visualization

Submitted To:
Name: Sadman Sadik Khan
Designation: Lecturer
Department: CSE
Daffodil International University

Submitted By:
Name: Fardus Alam
ID: 222-15-6167
Section: 62-G
Department: CSE
Daffodil International University

Submission Date: 15-03-2025


Code:
1. from google.colab import drive
2. drive.mount('/content/drive')
3.

Explanation:
Connecting google drive with the google colab.

Code:
1. import pandas as pd
2. import numpy as np
3. import matplotlib.pyplot as plt
4. import seaborn as
sb 5.

Explanation:
Importing necessary libraries for data frame handling and visualization.

Code:
1. df = pd.read_csv('/content/drive/MyDrive/lab dataset
data mining/healthcare-dataset-stroke-data2.csv')
2. df.head()
3.

Output:
Explanation:
Loading the csv file from google drive and showing first 5 data point form df data set.

Code:

1. df.info()
2.

Output:

Explanation:
Showing some basic informations like not null count and data type of each column from df data set.

Code:
1. df.describe()
2.
Output:

Explanation:
Showing some statistics of each numerical columns. Statistics – count, mean, std, min, max and quartiles.

Code: Scatter Plot


1. x = df['age']
2. y =
df['bmi'] 3.
4. sns.scatterplot(data= df, x=x, y=y, hue = 'gender')
5. plt.show()
6.

Output:
Explanation:
Seaborn's scatterplot() function is used to create scatter plots, which visualize the relationship between
two numerical variables (age and bmi). It allows customization of colors, sizes, and styles based on
additional categorical variables. I use here hue for gender column that separate male and female as color.

Code: scatter plot using matplotlib


1. plt.scatter(df['age'],df['bmi'], c='g', label='age & bmi')
2. plt.scatter(df['age'],df['avg_glucose_level'],
c='b',label='age & avg_glucose_level')
3.
4. plt.title('Scatter plot using matplotlib with different
color and label')
5. plt.xlabel('age')
6. plt.ylabel('bmi &
avg_glucose_level') 7.
8. plt.legend()
9. plt.show()
10.

Output:
Explanation:
This code creates a scatter plot using Matplotlib to visualize the relationships between age vs. bmi and
age vsavg_glucose_level, with different colors for distinction. It helps compare how age correlates with
both bmi and avg_glucose_level

Code: Barplot
1. plt.title('Barplot between work_type & stroke')
2. sns.barplot(data=df, x='work_type', y='stroke',
hue= 'gender',errorbar=None)
3. plt.show()
4.

Output:

Explanation:
Above code creates a bar plot using Seaborn to compare the relationship between work type and stroke
occurrence, while differentiating by gender.
Code: Pie chart
1. gender = df['gender'].value_counts()
2. colors = ['r', 'g',
'b'] 3.
4. plt.pie(gender, labels=gender.index, autopct='%1.2f%%',
colors=colors, startangle=0, explode=(0.1,0,0,),
wedgeprops={'edgecolor': 'black'})
5.
6. plt.title("Pie Chart")
7. plt.show()
8.

Output:

Explanation:
Create pie chart using matplotlib for gender colums
Key terms:
 df['gender'].value_counts(): Gets gender counts dynamically.
 colors =['r', 'g', 'b']: Assigns red, green, and blue to slices.
 autopct='%1.2f%%': Displays percentages with two decimal places.
 explode=(0.1, 0, 0): Slightly separates the first slice for emphasis.
 wedgeprops={'edgecolor': 'black'}: Adds black borders for clarity.
 startangle=0: Starts the chart from 0 degrees

Code: Histogram
1. sns.histplot(data = df, x= 'work_type', color='g')
2. plt.title('Histogram')
3. plt.show()
4.

Output:
Explanation:
This code creates a histogram using Seaborn to visualize the distribution of the work_type variable in the
DataFrame df.

Key terms:

 sns.histplot(): Plots the histogram for the specified variable.


 x='work_type': Specifies that the data for the work_type column will be used.
 color='g': Sets the color of the bars to green.
 plt.title(): Adds the title "Histogram".
 plt.show(): Displays the plot.

Code: Box plot


1. sns.boxplot(data= df, x='bmi', hue= 'gender', )
2. plt.title('Box Plot')
3. plt.show()
4.

Output:
Explanation:
Above code creates a box plot using Seaborn to compare the distribution of bmi across different genders
in the DataFrame df.

Key terms:

 sns.boxplot(): Plots the box plot.

 x='bmi': Plots the bmi values on the x-axis.

 hue='gender': Differentiates the data by gender using different colors.

Code: Heatmap
1. new_df =
df[['age','bmi','avg_glucose_level','stroke']] 2.
3. plt.figure(figsize = (10 , 5))
4. sns.heatmap(new_df.corr(), annot = True, linewidths=0.2)
5. plt.title('Heatmap ')
6. plt.show()
7.

Output:
Explanation:
Heatmap to visualize the correlation matrix of selected columns in the new_df DataFrame.

Key Features:

 new_df.corr(): Calculates the correlation coefficients between the columns (age, bmi,
avg_glucose_level, stroke).
 sns.heatmap(): Plots the heatmap with annotations showing the correlation values.
 annot=True: Displays the correlation values inside the heatmap cells.
 linewidths=0.2: Adds thin lines between cells for better separation

You might also like