INDEX
Exp Aim of Date of Date of Page Remarks
No. Experiment Allotment Evaluation No.
1 Exploring Panda’s DataFrame 03-01-2023 10-01-2023 2
2 Data Cleansing 10-01-2023 17-01-2023 9
3 Statistical Summary and basic plotting 17-01-2023 24-01-2023 15
4 Predict CO2 Emission using Simple Linear 24-01-2023 31-01-2023 21
Regression
5 Predict CO2 Emission using Gradient 31-01-2023 07-02-2023 26
Descent
6 Predict CO2 Emission using Multilinear 07-02-2023 14-02-2023 31
Regression.
7 Perform logistic regression on the 14-02-2023 21-02-2023 35
“ChurnData.csv”.
8 Linear Discriminant Analysis on the IRIS 21-02-2023 28-02-2023 38
dataset
9 Using Support Vector Machine build a 28-02-2023 07-03-2023 41
breast cancer detection system.
10 Using decision tree, build an ML model on 07-03-2023 14-03-2023 43
the “diabetes” dataset.
11 Classify the “Iris Dataset” using KNN 14-03-2023 21-03-2023 46
12 Cluster the dataset using K-Means 21-03-2023 28-03-2023 48
Clustering
1
Shivendra Singh
A2305220681
Practical-1 Date: 03-01-2023
Aim: Exploring Panda’s DataFrame
You are given a CSV file named `student.csv', whose first few records are as below
Write the python code / command for following questions
(a) Load this file into a python Data Frame
Output:
(b) Find the number of rows and columns in it.
Output:
2
Shivendra Singh
A2305220681
(c) Print column names
Output:
(d) Change column name ‘Name’ with new name
‘FirstName’
Output:
(e) Print last 5 rows from the bottom
Output:
(f) Print the details of student with lowest marks
Output:
(g) Find total marks of all female students
Output:
3
Shivendra Singh
A2305220681
(h) List names of all the male students
Output:
(i) Find mean age of the class
Output:
(j) Line plot marks of the class
Output:
(k) Find the index of record of oldest student in the
class
Output:
4
Shivendra Singh
A2305220681
(l) sort and print the data on the basis of Name followed by Age.
Output:
(m) Change the name 'Nihar' to 'Jason Bourne' in
name column of the DataFrame.
Output:
(n) Change and print order of the columns (Name,
Sex, Age, Marks, Grade).
Output:
5
Shivendra Singh
A2305220681
(o) Count and print number of students sex wise and display result with suitable column
headers.
Output:
(p) Delete and print row where age=46
Output:
(q) Print the data types of individual columns of the data frame
Output:
(r) Convert and print the datatype of a given column Age (int to float).
Output:
6
Shivendra Singh
A2305220681
(s) Create a new column named “UpdatedMarks” which as 5.5% more marks than the existing
Marks column.
Output:
(t) Delete the “Marks Column”
Output:
Conclusion:
Hence, the experiment to understand basics of machine learning were studied using python.
7
Shivendra Singh
A2305220681
Evalutaion:
8
Shivendra Singh
A2305220681
Practical-2 Date: 10-01-2023
AIM: Data Cleansing
The given csv file named “RSData.csv” contains real state data for a particular city.
Write python command/ code to answer the following questions on this dataset.
propertyid stno stname owneroccupied numbedrooms numbathrooms areasqft
100001000 10 Kranti Y 3 1 1000
Road
Bhagat
Singh
100002000 17 Road N 3 1.5 --
Bhagat
Singh
100003000 Road N n/a 1 850
100004000 20 Azad 12 1 NaN 700
Mard
23 Azad Y 3 2 1600
Mard
100006000 20 Azad Y NA 1 800
Mard
100007000 NA Shivaji 2 Surya 950
Road
100008000 13 Netaji Y 1 1
Marg
100009000 15 Netaji Y na 2 1800
Marg
9
Shivendra Singh
A2305220681
1. Write the command to read this data in the data frame.
Output:
2. List number of rows and columns in this dataset
Output:
3. Check if there is any missing value in this entire dataset?
Output:
10
Shivendra Singh
A2305220681
4. Which column(s) does not have any missing value?
Output:
5. Which columns have maximum number of missing values?
Output:
6. There are how many rows, which does not have any missing value(s)?
Output:
7. If there is any missing value in the “areasqft” replace it with 900
Output:
11
Shivendra Singh
A2305220681
8. Fill the street number of record at index 2 with 77
Output:
9. If there is any missing value in the “number of bedrooms” columns, then replace it with the
median value of this column.
Output:
10. Give your comment on the “owner occupied” column.
Output:
12
Shivendra Singh
A2305220681
11. It is believed that the data entry operator might have entered integer values in “owner
occupied” column. Count number of such entries and replace them with numpy standard nan
(np.nan).
Output:
Conclusion:
Hence, the concept of data cleaning is studied and implemented successfully.
13
Shivendra Singh
A2305220681
Evalutaion:
14
Shivendra Singh
A2305220681
Practical-3 Date: 17-01-2023
AIM: Statistical Summary and basic plotting
Load the “mtcar.csv” dataset and write python code to perform the following operations
model mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21 6 160 110 3.9 2.62 16.46 0 1 4 4
Mazda RX4 Wag 21 6 160 110 3.9 2.87 17.02 0 1 4 4
5
Datsun 710 22. 4 108 93 3.85 2.32 18.61 1 1 4 1
8
Hornet 4 Drive 21. 6 258 110 3.08 3.21 19.44 1 0 3 1
4 5
Hornet 18. 8 360 175 3.15 3.44 17.02 0 0 3 2
Sportabout 7
Valiant 18. 6 225 105 2.76 3.46 20.22 1 0 3 1
1
Duster 360 14. 8 360 245 3.21 3.57 15.84 0 0 3 4
3
Merc 240D 24. 4 146.7 62 3.69 3.19 20 1 0 4 2
4
Merc 230 22. 4 140.8 95 3.92 3.15 22.9 1 0 4 2
8
(a) Generate various summary statistics such as mean, standard deviation, minimum value,
maximum value, and “1,2 & 3rd quantiles” for all the numerical attributes.
Output:
15
Shivendra Singh
A2305220681
16
Shivendra Singh
A2305220681
(b) Count the number of non-NaN items per feature
Output:
(c) Calculate mean absolute deviation.
Output:
(d) Calculate median for any of the numeric type column
Output:
(e) Calculate mean any of the numeric type column
Output:
17
Shivendra Singh
A2305220681
(f) Calculate mode for “hp” column
Output:
(g) Calculate skewness for “disp”
Output:
(h) Find the coefficient of correlation between all the numeric attributes.
Output:
(i) Scatter plot a graph between “hp” vs “mpg”
Output:
(j) Plot a density diagram (kde- kernel desity estimation) for “displacement”
18
Shivendra Singh
A2305220681
Output:
(k) Plot a bar diagram for “CarName” vs “mpg”
Output:
19
Shivendra Singh
A2305220681
(l) Box plot the “hp” attribute.
Output:
Conclusion:
Hence, the concept of plotting of different types of graphs is studied and implemented
successfully.
20
Shivendra Singh
A2305220681
Evalutaion:
21
Shivendra Singh
A2305220681
Practical-4 Date: 24-01-2023
AIM: Predict CO2 Emission using Simple Linear Regression
You are given with following data
(a) Scatter plot “EngineSize vs CO2Emissions”
22
Shivendra Singh
A2305220681
(b) Predict the value of CO2 emission on the basis of Engine size for enginesize=2.4. using
data given in “CO2Small.csv” file.
(c) Calculate R^2.
(d) Calculate the above details on the larger dataset i.e., file “CO2Full Data.csv”
23
Shivendra Singh
A2305220681
(e) Print the final equation of the line
(f) Plot final line (best fit line) along with other data points.
24
Shivendra Singh
A2305220681
(g) Give your comments by comparing the R2 of both the datasets.
Conclusion:
Hence, the prediction of CO2 Emission using Simple Linear Regression was implemented
successfully.
25
Shivendra Singh
A2305220681
Evalutaion:
26
Shivendra Singh
A2305220681
Practical-5 Date: 31-01-2023
AIM: Predict CO2 Emission using Gradient Descent
27
Shivendra Singh
A2305220681
(a) Scatter plot “EngineSize vs CO2Emissions”
28
Shivendra Singh
A2305220681
(b) Calculate R^2 value
29
Shivendra Singh
A2305220681
(c) Print the final equation of the line
(d) Plot final line (best fit line) along with other data points.
Conclusion:
Hence, the prediction of CO2 Emission using Gradient Descent was implemented
successfully.
30
Shivendra Singh
A2305220681
Evalutaion:
31
Shivendra Singh
A2305220681
Practical-6 Date: 07-02-2023
AIM: Predict CO2 Emission using Multilinear Regression.
a. Using the above data build a multi-variable linear regression.
32
Shivendra Singh
A2305220681
b. What is the accuracy level of your model?
The accuracy of the model is measured by the R^2 score
c. Which attributes have you used in this model?
ENGINESIZE
CYLINDERS
FUELCONSUMPTION_CITY
33
Shivendra Singh
A2305220681
FUELCONSUMPTION_HWY
FUELCONSUMPTION_COMB.
d. Write your observation /comment about this data and the model.
The ENGINESIZE and CYLINDERS attributes are positively correlated with CO2
emissions, while the fuel consumption attributes are negatively correlated.
The linear regression model assumes a linear relationship between the input attributes
and the output variable, which may not be entirely accurate in this case.
There may be other factors that influence CO2 emissions that are not captured by the
input attributes in this dataset.
Overall, the model seems to provide a reasonably accurate prediction of CO2
emissions based on the available data
Conclusion:
Hence, the prediction of CO2 Emission using Multilinear Regression was implemented
successfully.
34
Shivendra Singh
A2305220681
Evalutaion:
35
Shivendra Singh
A2305220681
Practical-7 Date: 14-02-2023
AIM: Perform logistic regression on the “ChurnData.csv”.
Consider only following features from the file.
[['tenure', 'age', 'address', 'income', 'ed', 'employ', 'equip', 'callcard', 'wireless','churn']]
a. Print confusion matrix
36
Shivendra Singh
A2305220681
b. Print Classification matrix
Conclusion:
Hence, the Logistics Regression was implemented successfully.
37
Shivendra Singh
A2305220681
Evalutaion:
38
Shivendra Singh
A2305220681
Practical-8 Date: 21-03-2023
AIM: Linear Discriminant Analysis on the IRIS dataset
39
Shivendra Singh
A2305220681
Conclusion:
Hence, the Linear Discriminant Analysis on the IRIS dataset was done successfully.
40
Shivendra Singh
A2305220681
Evalutaion:
41
Shivendra Singh
A2305220681
Practical-9 Date: 28-03-2023
AIM: Using Support Vector Machine build a breast cancer detection
system. Use the breast cancer dataset available with the datasets package of
sklearn library.
a. Print the accuracy, precision and recall for the model built.
Conclusion:
Hence, Using Support Vector Machine a breast cancer detection system was built
successfully.
42
Shivendra Singh
A2305220681
Evalutaion:
43
Shivendra Singh
A2305220681
Practical-10 Date: 07-03-2023
AIM: Using decision tree, build an ML model on the “diabetes” dataset.
44
Shivendra Singh
A2305220681
Conclusion:
Hence, using decision tree a ML model on the “diabetes” dataset was built successfully.
45
Shivendra Singh
A2305220681
Evalutaion:
46
Shivendra Singh
A2305220681
Practical-11 Date: 14-03-2023
AIM: Classify the “Iris Dataset” using KNN. Print Precision, Recall, F1 and
Support.
Conclusion:
Hence, classification of the “Iris Dataset” using KNN was implemented successfully.
47
Shivendra Singh
A2305220681
Evalutaion:
48
Shivendra Singh
A2305220681
Practical-12 Date: 21-03-2023
AIM: You are given with “Mall-Customer”. Cluster the dataset using K-
Means Clustering. Show the steps for estimation of optimum value of k.
First few records of the dataset are given below.
49
Shivendra Singh
A2305220681
50
Shivendra Singh
A2305220681
The above code performs the following steps:
1.Loads the Mall-Customer dataset
2.Selects the columns to use for clustering
3.Scales the features using StandardScaler
4.Estimates the optimum value of k using the elbow method and the silhouette score
5.Clusters the dataset using the optimum value of k
6.Visualizes the clusters
The optimum value of k in k-means clustering can be estimated using the following steps:
Elbow method: Plot the within-cluster sum of squares (WCSS) against the number of clusters (k). The
WCSS is the sum of squared distances between each point in a cluster and its centroid. The plot will
have a shape like an elbow, and the optimum value of k will be at the "elbow" or the point where
the rate of decrease in WCSS slows down significantly. This method gives a visual representation of
the best k value for clustering.
Silhouette method: The silhouette score measures how similar a data point is to its own cluster
compared to other clusters. It ranges from -1 to 1, where a score closer to 1 indicates that the point
is well-matched to its own cluster and poorly matched to neighbouring clusters. Compute the
average silhouette score for different values of k and choose the k with the highest average score.
This method is more quantitative than the elbow method and can be used to find a more precise
value of k.
Conclusion:
Hence, the cluster the dataset using K-Means Clustering was done successfully.
51
Shivendra Singh
A2305220681
Evalutaion:
52
Shivendra Singh
A2305220681