0% found this document useful (0 votes)

27 views5 pages

ML Lab-1

The document describes analyzing a dataset using Python libraries like Pandas and Scikit-learn. It loads a CSV dataset, cleans missing values, encodes categorical variables, splits the data into train and test sets, and fits linear and random forest regression models to make predictions on the test set. Model performance is evaluated using mean squared error.

Uploaded by

shrinkhal03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views5 pages

ML Lab-1

Uploaded by

shrinkhal03

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

import numpy as np

import pandas as pd

dataset = pd.read_csv("Data.csv")
dataset

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

\"fields\": [\n {\n \"column\": \"Country\",\n
\"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 3,\n \"samples\": [\n
\"France\",\n \"Spain\",\n \"Germany\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Age\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 7.693792591722527,\n
\"min\": 27.0,\n \"max\": 50.0,\n \"num_unique_values\":
9,\n \"samples\": [\n 50.0,\n 27.0,\n
35.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Salary\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 12265.579661982732,\n \"min\": 48000.0,\n
\"max\": 83000.0,\n \"num_unique_values\": 9,\n
\"samples\": [\n 83000.0,\n 48000.0,\n
52000.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Purchased\",\n \"properties\": {\n \"dtype\":
\"category\",\n \"num_unique_values\": 2,\n \"samples\":
[\n \"Yes\",\n \"No\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n }\n ]\n}","type":"dataframe","variable_name":"dataset"}

dataset["Age"].fillna(np.mean(dataset["Age"]))

0 44.000000
1 27.000000
2 30.000000
3 38.000000
4 40.000000
5 35.000000
6 38.777778
7 48.000000
8 50.000000
9 37.000000
Name: Age, dtype: float64

dataset["Age"] = dataset["Age"].fillna(np.mean(dataset["Age"]))

dataset

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

\"fields\": [\n {\n \"column\": \"Country\",\n
\"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 3,\n \"samples\": [\n
\"France\",\n \"Spain\",\n \"Germany\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Age\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 7.253777219533095,\n
\"min\": 27.0,\n \"max\": 50.0,\n \"num_unique_values\":
10,\n \"samples\": [\n 50.0,\n 27.0,\n
35.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Salary\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 12265.579661982732,\n \"min\": 48000.0,\n
\"max\": 83000.0,\n \"num_unique_values\": 9,\n
\"samples\": [\n 83000.0,\n 48000.0,\n
52000.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Purchased\",\n \"properties\": {\n \"dtype\":
\"category\",\n \"num_unique_values\": 2,\n \"samples\":
[\n \"Yes\",\n \"No\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n }\n ]\n}","type":"dataframe","variable_name":"dataset"}

dataset["Salary"] =
dataset["Salary"].fillna(np.mean(dataset["Salary"]))

dataset

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

\"fields\": [\n {\n \"column\": \"Country\",\n
\"properties\": {\n \"dtype\": \"category\",\n
\"num_unique_values\": 3,\n \"samples\": [\n
\"France\",\n \"Spain\",\n \"Germany\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Age\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 7.253777219533095,\n
\"min\": 27.0,\n \"max\": 50.0,\n \"num_unique_values\":
10,\n \"samples\": [\n 50.0,\n 27.0,\n
35.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Salary\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 11564.099405562389,\n \"min\": 48000.0,\n
\"max\": 83000.0,\n \"num_unique_values\": 10,\n
\"samples\": [\n 83000.0,\n 48000.0,\n
58000.0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Purchased\",\n \"properties\": {\n \"dtype\":
\"category\",\n \"num_unique_values\": 2,\n \"samples\":
[\n \"Yes\",\n \"No\"\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n }\n ]\n}","type":"dataframe","variable_name":"dataset"}
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

dataset["Purchased"] = le.fit_transform(dataset["Purchased"])

dataset["Country"] = le.fit_transform(dataset["Country"])

dataset

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

\"fields\": [\n {\n \"column\": \"Country\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
0,\n \"min\": 0,\n \"max\": 2,\n
\"num_unique_values\": 3,\n \"samples\": [\n 0,\n
2,\n 1\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n },\n {\n \"column\":
\"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n
\"std\": 7.253777219533095,\n \"min\": 27.0,\n \"max\":
50.0,\n \"num_unique_values\": 10,\n \"samples\": [\n
50.0,\n 27.0,\n 35.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Salary\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\":
11564.099405562389,\n \"min\": 48000.0,\n \"max\":
83000.0,\n \"num_unique_values\": 10,\n \"samples\": [\n
83000.0,\n 48000.0,\n 58000.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Purchased\",\n
\"properties\": {\n \"dtype\": \"number\",\n \"std\":
0,\n \"min\": 0,\n \"max\": 1,\n
\"num_unique_values\": 2,\n \"samples\": [\n 1,\n
0\n ],\n \"semantic_type\": \"\",\n
\"description\": \"\"\n }\n }\n ]\
n}","type":"dataframe","variable_name":"dataset"}

dataset.iloc[:,:-1]

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(dataset.iloc[:,:-
1], dataset["Purchased"], test_size=0.2)
# print(x_train, x_test, y_train, y_test)

y_train

2 0
8 0
4 1
6 0
5 1
0 0
1 1
9 1
Name: Purchased, dtype: int64

from sklearn.linear_model import LinearRegression

lr = LinearRegression()
lr.fit(x_train, y_train)

LinearRegression()

x_test

{"summary":"{\n \"name\": \"x_test\",\n \"rows\": 2,\n \"fields\":

[\n {\n \"column\": \"Country\",\n \"properties\": {\n
\"dtype\": \"number\",\n \"std\": 1,\n \"min\": 0,\n
\"max\": 2,\n \"num_unique_values\": 2,\n \"samples\":
[\n 2,\n 0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n },\n {\n
\"column\": \"Age\",\n \"properties\": {\n \"dtype\":
\"number\",\n \"std\": 7.0710678118654755,\n \"min\":
38.0,\n \"max\": 48.0,\n \"num_unique_values\": 2,\n
\"samples\": [\n 38.0,\n 48.0\n ],\n
\"semantic_type\": \"\",\n \"description\": \"\"\n }\
n },\n {\n \"column\": \"Salary\",\n \"properties\":
{\n \"dtype\": \"number\",\n \"std\":
12727.922061357855,\n \"min\": 61000.0,\n \"max\":
79000.0,\n \"num_unique_values\": 2,\n \"samples\": [\n
61000.0,\n 79000.0\n ],\n \"semantic_type\":
\"\",\n \"description\": \"\"\n }\n }\n ]\
n}","type":"dataframe","variable_name":"x_test"}
predict = lr.predict(x_test)

from sklearn.metrics import mean_squared_error

mean_squared_error(y_test, predict)

0.29618249533841706

from sklearn.ensemble import RandomForestClassifier

rfc = RandomForestClassifier(max_depth=150)
rfc.fit(x_train,y_train)

RandomForestClassifier(max_depth=150)

rfc.predict(x_test)

array([0, 1])

x_test

{"summary":"{\n \"name\": \"x_test\",\n \"rows\": 2,\n \"fields\":

y_test

7 1
3 0
Name: Purchased, dtype: int64

DACLUSTER
No ratings yet
DACLUSTER
9 pages
Import As Import As Import As Import: Pandas PD Numpy NP Matplotlib - Pyplot PLT Sklearn DF PD - Read - CSV DF
No ratings yet
Import As Import As Import As Import: Pandas PD Numpy NP Matplotlib - Pyplot PLT Sklearn DF PD - Read - CSV DF
9 pages
VoThaiThaoNhi ECON209 F2024 Lab 2
No ratings yet
VoThaiThaoNhi ECON209 F2024 Lab 2
10 pages
Loan Default Prediction System
No ratings yet
Loan Default Prediction System
13 pages
Copy of Final Project
No ratings yet
Copy of Final Project
16 pages
BD WPS2
No ratings yet
BD WPS2
23 pages
Python 3
No ratings yet
Python 3
9 pages
MLT Ann Lab 2
No ratings yet
MLT Ann Lab 2
7 pages
Task 1
No ratings yet
Task 1
5 pages
Assignment 1 ML
No ratings yet
Assignment 1 ML
30 pages
Lab2
No ratings yet
Lab2
15 pages
# Importing Necessary Libraries: Import As Import As Import As Import As
No ratings yet
# Importing Necessary Libraries: Import As Import As Import As Import As
21 pages
IS - Extended - Project - Guided - Template - Notebook
No ratings yet
IS - Extended - Project - Guided - Template - Notebook
26 pages
Week 4
No ratings yet
Week 4
13 pages
Supply Chain Analytics
No ratings yet
Supply Chain Analytics
20 pages
Covid 19 Analysis and Visualization Using Plotly Express
No ratings yet
Covid 19 Analysis and Visualization Using Plotly Express
11 pages
Kakauikkla
No ratings yet
Kakauikkla
51 pages
Projet 2 Classification Des Crédits
No ratings yet
Projet 2 Classification Des Crédits
24 pages
Copy of ML - Assignment
No ratings yet
Copy of ML - Assignment
7 pages
Another Copy of Ensemble Models Original Paid
No ratings yet
Another Copy of Ensemble Models Original Paid
51 pages
CVD Web
No ratings yet
CVD Web
22 pages
B58 - Handling Missing Values, Feature - Selection
No ratings yet
B58 - Handling Missing Values, Feature - Selection
4 pages
Heart Disease Classification Full-1
No ratings yet
Heart Disease Classification Full-1
3 pages
Bose A S
No ratings yet
Bose A S
37 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
4 pages
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
Cleaning Data
No ratings yet
Cleaning Data
6 pages
Plot 3D: Import As
No ratings yet
Plot 3D: Import As
26 pages
Experiment No 11
No ratings yet
Experiment No 11
19 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
B58 Random Forest
No ratings yet
B58 Random Forest
4 pages
1 4-EDA Ipynb
No ratings yet
1 4-EDA Ipynb
12 pages
Tathagat Upi
No ratings yet
Tathagat Upi
34 pages
ML Merged
No ratings yet
ML Merged
28 pages
1 Linear Regression - Ipynb
No ratings yet
1 Linear Regression - Ipynb
16 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Aiml
No ratings yet
Aiml
27 pages
1 Linear Regression - Ipynb
No ratings yet
1 Linear Regression - Ipynb
66 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
DSBDA3 - Jupyter Notebook
No ratings yet
DSBDA3 - Jupyter Notebook
12 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
Jupyter Notebook Project CART RF ANN
100% (1)
Jupyter Notebook Project CART RF ANN
41 pages
Experiment 2
No ratings yet
Experiment 2
5 pages
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
No ratings yet
Social Network Analysis: Cheruvu Nvss Suhas 21BCE8374
10 pages
DAR CompleteFile 1
No ratings yet
DAR CompleteFile 1
41 pages
DA Basics
No ratings yet
DA Basics
6 pages
21mic0107 1
No ratings yet
21mic0107 1
7 pages
Ass 1 ML
No ratings yet
Ass 1 ML
21 pages
Second
No ratings yet
Second
4 pages
RegresiÃ N Lineal Con Python - Ipynb
No ratings yet
RegresiÃ N Lineal Con Python - Ipynb
83 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
Alishba (S005)
No ratings yet
Alishba (S005)
5 pages
Dsbda Exp4 Part1
No ratings yet
Dsbda Exp4 Part1
39 pages
KNN For Classification
No ratings yet
KNN For Classification
5 pages
Machine Learning Program
No ratings yet
Machine Learning Program
12 pages
Windows MFS110 RDService ReleaseNotes 1.1.0
No ratings yet
Windows MFS110 RDService ReleaseNotes 1.1.0
4 pages
Suzuki Equiry Max HLD v1.4
No ratings yet
Suzuki Equiry Max HLD v1.4
22 pages
AI-HPC Is Happening Now
No ratings yet
AI-HPC Is Happening Now
16 pages
Python Notes
No ratings yet
Python Notes
2 pages
Project Report On Aaj
No ratings yet
Project Report On Aaj
57 pages
Mzpack 3 User Guide (En)
No ratings yet
Mzpack 3 User Guide (En)
102 pages
Solaris & Oracle Cluster Setup Guide
92% (12)
Solaris & Oracle Cluster Setup Guide
108 pages
APQP PQP Flow Chart PDF
100% (2)
APQP PQP Flow Chart PDF
1 page
Forest Stack RFP
No ratings yet
Forest Stack RFP
41 pages
Mekelle University Ethiopian Institute of Technology-Mekelle Mechanical Engineering Department
No ratings yet
Mekelle University Ethiopian Institute of Technology-Mekelle Mechanical Engineering Department
3 pages
Curriculum (English)
No ratings yet
Curriculum (English)
8 pages
Claret College of Isabela: Information Technology Department
No ratings yet
Claret College of Isabela: Information Technology Department
15 pages
Brochures FX Y
No ratings yet
Brochures FX Y
20 pages
Software Reuse for Developers
No ratings yet
Software Reuse for Developers
9 pages
R for Data Science Beginners
No ratings yet
R for Data Science Beginners
37 pages
IT Admins' Guide: AD360 System Needs
No ratings yet
IT Admins' Guide: AD360 System Needs
5 pages
CC2530ZNP Mini Kit Quick Start Guide
No ratings yet
CC2530ZNP Mini Kit Quick Start Guide
2 pages
Class 6
No ratings yet
Class 6
25 pages
MyWalboxApp QuickStartGuide EU
No ratings yet
MyWalboxApp QuickStartGuide EU
15 pages
Adama TVET College
No ratings yet
Adama TVET College
12 pages
Kitchen Draw
No ratings yet
Kitchen Draw
62 pages
CLOSA (Direct) - India - V12.1.1
No ratings yet
CLOSA (Direct) - India - V12.1.1
7 pages
3615B English User Manual
No ratings yet
3615B English User Manual
14 pages
7 I 76
No ratings yet
7 I 76
9 pages
Cloud Computing and The Next Generation of Enterprise Architecture
No ratings yet
Cloud Computing and The Next Generation of Enterprise Architecture
27 pages
Adobe Photoshop Level 1 - EnG
No ratings yet
Adobe Photoshop Level 1 - EnG
56 pages
s7-1500 Techn Data Cpu en PDF
No ratings yet
s7-1500 Techn Data Cpu en PDF
11 pages
Resume Jan
No ratings yet
Resume Jan
1 page
V6.4.3e Releasenotes v3.0
No ratings yet
V6.4.3e Releasenotes v3.0
142 pages
Maximum Supported Hopping Rate Measurements Using The Universal Software Radio Peripheral Software Defined Radio
No ratings yet
Maximum Supported Hopping Rate Measurements Using The Universal Software Radio Peripheral Software Defined Radio
7 pages

ML Lab-1

Uploaded by

ML Lab-1

Uploaded by

import numpy as np

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

{"summary":"{\n \"name\": \"dataset\",\n \"rows\": 10,\n

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

{"summary":"{\n \"name\": \"x_test\",\n \"rows\": 2,\n \"fields\":

from sklearn.metrics import mean_squared_error

from sklearn.ensemble import RandomForestClassifier

{"summary":"{\n \"name\": \"x_test\",\n \"rows\": 2,\n \"fields\":

You might also like