0% found this document useful (0 votes)

114 views4 pages

Week 8 Lab - Linear Regression

The document outlines a Week 7 lab session for an AI course where students apply Linear Regression to predict insurance premiums using the 'Insurance.csv' dataset. It details the steps for data preprocessing, including handling null values, encoding categorical variables, and splitting the dataset into training and testing sets. Finally, it describes the model training process and evaluation metrics such as Mean Squared Error (MSE) and R-squared score.

Uploaded by

Zainab Segilola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views4 pages

Week 8 Lab - Linear Regression

Uploaded by

Zainab Segilola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

CMP4293 INTRODUCTION TO AI PRODUCED BY DR.

MARIAM ADEDOYIN-OLOWE

Welcome to the Week 7 lab session where you will continue to work on with the “Insurance.csv”
data. However, you will apply Linear Regression on the data to predict what insurance premium
people will be based on different attributes such as age, BMI, gender and smoking status.

from google.colab import files

file = files.upload()

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings('ignore')

data = pd.read_csv('insurance.csv')
data.info()

#check for null values

data.isnull().sum()

#check for any duplicated rows

data.duplicated().sum()

#check the original and the duplicated rows

data[data.duplicated(keep=False)]

#drop the duplicated row

data.drop_duplicates(inplace=True)

#check to confirm the duplicated row has been dropped

data.duplicated().sum()

data['sex'].value_counts()

#To convert text columns into number, let's display all the columns
with texts object
display(data['sex'].value_counts())
display(data['smoker'].value_counts())
display(data['region'].value_counts())

#import the relevant sklearn libraries needed to convert the text

columns into numeric values
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE
from sklearn.preprocessing import LabelEncoder
from Welcome to the Week 7 lab session
sklearn.preprocessing whereOneHotEncoder
import you will continue to work on with the “Insurance.csv”
from data. However, you will apply Linear Regression
sklearn.compose import ColumnTransformer on the data to predict what insurance premium
people will be based on different attributes such as age, BMI, gender and smoking status.

#creating one label encoder for sex and one label encoder for smoker
le_sex = LabelEncoder()
le_smoker = LabelEncoder()

#the fit object fits the specific values into the new columns using
only the 2 values e.g. male, female into 0,1
le_sex.fit(data['sex'].drop_duplicates())
le_smoker.fit(data['smoker'].drop_duplicates())

#applying the encording and saving the results in new columns. Note
that duplicates are not dropped here because we want to transform all
the rows
data['sex_enc'] = le_sex.transform(data['sex'])
data['smoker_enc'] = le_smoker.transform(data['smoker'])

#now let's check the transformation

data.head()

#transforming the 'region' column using the OneHotEncorder and applying

the 'passthrough'
#to the remaining columns so that the transformer leaves them as they
are
ct = ColumnTransformer( [ ('ohe', OneHotEncoder(), ['region']) ],
remainder='passthrough' )

trans = ct.fit_transform(data)

#listing out the new dataframe headers

ins_data = pd.DataFrame(trans, columns=ct.get_feature_names_out())

#listing the new columns

list(ins_data.columns)

ins_data.head()

#rename columns
ins_data.columns = ['region_northeast',
'region_northwest',
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE

'region_southeast',
'region_southwest',
'age',
'sex',
'bmi',
'children',
'smoker',
'charges',
'sex_enc',
'smoker_enc']

#reorder columns
ins_data = ins_data[[ 'age',
'sex',
'sex_enc',
'bmi',
'children',
'smoker',
'smoker_enc',
'region_northeast',
'region_northwest',
'region_southeast',
'region_southwest',
'charges'
]]

#remove object columns, save into new dataset, and convert to numeric
ins_data_t = ins_data[[ 'age',
'sex_enc',
'bmi',
'children',
'smoker_enc',
'region_northeast',
'region_northwest',
'region_southeast',
'region_southwest',
'charges'
]]

ins_data_t = ins_data_t.apply(pd.to_numeric)
ins_data_t.info()

df_corr = ins_data_t[['age',
CMP4293 INTRODUCTION TO AI PRODUCED BY DR. MARIAM ADEDOYIN-OLOWE

sns.heatmap(df_corr, vmin=-1, vmax=1, annot=True, fmt='.2f')

from sklearn.model_selection import train_test_split

df_feat = ins_data_t [['age',
'sex_enc',
'bmi',
'children',
'smoker_enc',
'charges'

]]

X = df_feat.iloc[:,0:-1]
y = df_feat.iloc[:,-1]

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=5,

test_size=0.3)

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X_train, y_train)

# y = a + B*X
# a = model.intercept
# B = model.coef_
model.intercept_, model.coef_

y_pred = model.predict(X_test)

from sklearn.metrics import mean_squared_error

from sklearn.metrics import mean_absolute_error

mse = mean_squared_error(y_pred, y_test)

sqrt_mse = np.sqrt(mse)
mae = mean_absolute_error(y_pred, y_test)

print(f"MSE : {mse:.3f}, MSE_SQRT : {sqrt_mse:.3f}, MAE : {mae:.3f}")

r2 = model.score(X_test, y_test)
print(f"R2 score: {r2:.3f}")

df_feat['charges'].min(), df_feat['charges'].max(),
df_feat['charges'].max()- df_feat['charges'].min()

df_feat.columns

val = model.predict([[50,1, 45.9, 1, 0,]])

print('Predicted Insurance Charge =', val)

How To Write Track 1 and 2 Dumps With Pin PitDumps EMV Software PDF
78% (9)
How To Write Track 1 and 2 Dumps With Pin PitDumps EMV Software PDF
2 pages
SML Lab 1
No ratings yet
SML Lab 1
19 pages
Medical Insurance Cost Prediction
No ratings yet
Medical Insurance Cost Prediction
16 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
Linear Regression Lab Guide
100% (1)
Linear Regression Lab Guide
8 pages
Task 2
No ratings yet
Task 2
4 pages
Prac2 174 Final
No ratings yet
Prac2 174 Final
5 pages
MDS-271Y Machine Learning: Cia-I
No ratings yet
MDS-271Y Machine Learning: Cia-I
6 pages
Machine Learning Assignment
No ratings yet
Machine Learning Assignment
2 pages
Python Sklearn Linear Regression
No ratings yet
Python Sklearn Linear Regression
45 pages
FYMCA IDSLab A6 Submission
No ratings yet
FYMCA IDSLab A6 Submission
9 pages
Code and Outputs
No ratings yet
Code and Outputs
25 pages
ML Manoj
No ratings yet
ML Manoj
51 pages
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
0% (1)
Medical Insurance Cost Prediction System: Dharesh Bahety EN18EL301057 Under The Guidance of Mr. Parag Ravekar Sir
18 pages
Medical
No ratings yet
Medical
4 pages
CL IV Manual
No ratings yet
CL IV Manual
108 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
Linear Regression Hands-On
No ratings yet
Linear Regression Hands-On
27 pages
Assignment
No ratings yet
Assignment
2 pages
Data Mining Lab: Regression & Clustering
No ratings yet
Data Mining Lab: Regression & Clustering
36 pages
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
No ratings yet
Linear Regression in Scikit-Learn (Sklearn) - An Introduction - Datagy
22 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Machine File
No ratings yet
Machine File
27 pages
Linear Regression on Insurance Data
No ratings yet
Linear Regression on Insurance Data
2 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Da Rec
No ratings yet
Da Rec
29 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
ML 6 7 8
No ratings yet
ML 6 7 8
10 pages
Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
No ratings yet
Predict Health Insurance Cost by Using Machine Learning and DNN Regression Models
7 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
Predicting Insurance Charges Using Linear Regressi
No ratings yet
Predicting Insurance Charges Using Linear Regressi
7 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
DWDM Lab Manual
No ratings yet
DWDM Lab Manual
32 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
2 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
Multilinear ProblemStatement
No ratings yet
Multilinear ProblemStatement
132 pages
Simple Linear Regression in Machine Learning
No ratings yet
Simple Linear Regression in Machine Learning
7 pages
Linear Regression PDF
100% (1)
Linear Regression PDF
32 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Import Pandas As PD
No ratings yet
Import Pandas As PD
3 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
34 pages
Assignment III
No ratings yet
Assignment III
3 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Lecture 09 - 02.09.2024 - Regression-01
No ratings yet
Lecture 09 - 02.09.2024 - Regression-01
62 pages
Logistic Regression
No ratings yet
Logistic Regression
16 pages
Notes 23 Regression R
No ratings yet
Notes 23 Regression R
5 pages
Exp 1
No ratings yet
Exp 1
7 pages
Simple Linear Regression Code
No ratings yet
Simple Linear Regression Code
3 pages
Generative AI For Models Development
No ratings yet
Generative AI For Models Development
8 pages
2 Linear Regression
No ratings yet
2 Linear Regression
5 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
003-FIN7790 (Part2)
No ratings yet
003-FIN7790 (Part2)
162 pages
Linear Regression Salary Prediction
No ratings yet
Linear Regression Salary Prediction
8 pages
Linearregression
No ratings yet
Linearregression
18 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Ai Manual Lab
No ratings yet
Ai Manual Lab
54 pages
Introduction To Regression With Statsmodels in Python
No ratings yet
Introduction To Regression With Statsmodels in Python
142 pages
CV Syllabus
No ratings yet
CV Syllabus
3 pages
Loxone Compendium Building Automation
No ratings yet
Loxone Compendium Building Automation
44 pages
45 Excel Formulas
No ratings yet
45 Excel Formulas
138 pages
Hot Key
No ratings yet
Hot Key
8 pages
SIR2 Manual
No ratings yet
SIR2 Manual
32 pages
Multimedia Unit 4
No ratings yet
Multimedia Unit 4
16 pages
Crawler Crane: SCC1000A-6
No ratings yet
Crawler Crane: SCC1000A-6
51 pages
How To Find The Where Used List of Query Restrictions
No ratings yet
How To Find The Where Used List of Query Restrictions
14 pages
Porn Site Block List for Parents
0% (1)
Porn Site Block List for Parents
97 pages
PGDCA Project: Time Table System
No ratings yet
PGDCA Project: Time Table System
4 pages
SM 25
No ratings yet
SM 25
17 pages
Dennis
No ratings yet
Dennis
27 pages
vm51616H - Video - Matrix - Switch - Ds - en
No ratings yet
vm51616H - Video - Matrix - Switch - Ds - en
3 pages
Project Report
No ratings yet
Project Report
88 pages
Wireless Printer Manual
No ratings yet
Wireless Printer Manual
16 pages
Workflow Attributes - HTML Body
No ratings yet
Workflow Attributes - HTML Body
12 pages
KIDNAPPERS AND ROBBERS THREAT-ALERT INTELLIGENT SYSTEM 2 Unical Conference
No ratings yet
KIDNAPPERS AND ROBBERS THREAT-ALERT INTELLIGENT SYSTEM 2 Unical Conference
13 pages
RDBMS - SQL
No ratings yet
RDBMS - SQL
30 pages
MSC Adams 2019.2 Software Overview
No ratings yet
MSC Adams 2019.2 Software Overview
2 pages
Skylon (Album)
No ratings yet
Skylon (Album)
4 pages
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
No ratings yet
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
24 pages
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
No ratings yet
tiaSYSUP1500 - 01 - SystemOverview - en - 28 31 01 2020
27 pages
Artificial Intelligence Questions
No ratings yet
Artificial Intelligence Questions
15 pages
2 Static & Dynamic Web Pages
No ratings yet
2 Static & Dynamic Web Pages
24 pages
Remote Entity Authentication Using Chaotic Maps in Telemedicine (React)
No ratings yet
Remote Entity Authentication Using Chaotic Maps in Telemedicine (React)
13 pages
AWS Assignment
No ratings yet
AWS Assignment
7 pages
HUAWEI G730-U10 V100R001C00B112CUSTC433D002 Update Guide
No ratings yet
HUAWEI G730-U10 V100R001C00B112CUSTC433D002 Update Guide
15 pages
Here Is The Placeholder For Three Lines Title Create Social Media Accounts For Your Business
No ratings yet
Here Is The Placeholder For Three Lines Title Create Social Media Accounts For Your Business
21 pages
Philips Pet716 Service Manual
No ratings yet
Philips Pet716 Service Manual
31 pages