0% found this document useful (0 votes)

29 views5 pages

Linear and Multilinear Regression

This document discusses linear regression analysis performed on an insurance dataset with various demographic and health variables. It describes data cleaning steps, exploratory data analysis including distributions of variables, and encoding of categorical variables before splitting the data for model training and evaluation.

Uploaded by

Harisankar R N R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views5 pages

Linear and Multilinear Regression

Uploaded by

Harisankar R N R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Course: BCSE424L_ ML for Robotics

Date: 12.01.2024
Name: Harisankar R N R
Reg No: 21BRS1524
Lab1: Linear and Multilinear Regression

In [1]:
%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (20.0, 10.0) data = pd.read_csv("C:/Users/91812/21BRS1518/headbrain.csv")
#Download the dataset and give the appropriate path print(data.shape) data.head()

(237, 4)

Out[1]: Gender Age Range Head Size(cm^3) Brain Weight(grams)

0 1 1 4512 1530

1 1 1 3738 1297

2 1 1 4261 1335

3 1 1 3777 1282

4 1 1 4177 1590

In [2]:
X=data['Head Size(cm^3)'].values
Y=data['Brain Weight(grams)'].values

mean_x=np.mean(X)
mean_y=np.mean(Y) n=len(X)
#Total number of values numer=0
denom=0 for i in range(n):
numer+=(X[i]-mean_x) * (Y[i] - mean_y)
denom+=(X[i]-mean_x) ** 2
b1=numer/denom b0=mean_y -
(b1*mean_x) print(b1,b0) #Print
cooefficients

In [3]:

0.26342933948939945 325.57342104944223
In [4]:

max_x=np.max(X)+100 min_x=np.min(X)-100 x=np.linspace(min_x, max_x, 1000)

y=b0+b1*x plt.plot(x,y,color='#58b970', label='Regression Line') #Ploting line
plt.scatter(X, Y, c='#ef5423', label='Scatter Plot') #Ploting Scatter Points
plt.xlabel('Head Size in cm3') plt.ylabel('Brain Weight in grams') plt.legend()
plt.show()
In [5]:
ss_t=0 ss_r=0 for i in
range(n): y_pred=b0+b1 *
X[i] ss_t+=(Y[i]-mean_y)
** 2 ss_r+=(Y[i]-y_pred)
** 2
r2=1-(ss_r/ss_t)
print(r2)

0.6393117199570003
In [ ]:
#IMPORTING THE DEPENDENCIES import
numpy as np import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split from
sklearn.linear_model import LinearRegression from sklearn
import metrics import warnings
warnings.filterwarnings('ignore')

In [1]:

Out[4]:

Out[5]:

RangeIndex: 1338 entries, 0 to 1337

Data columns (total 7 columns):
# Column Non-Null Count Dtype

0 age 1338 non-null int64

1 sex 1338 non-null object
2 bmi 1338 non-null float64
3 children 1338 non-null int64
4 smoker 1338 non-null object
5 region 1338 non-null object
6 charges 1338 non-null float64 dtypes: float64(2), int64(2),
object(3) memory usage: 73.3+ KB

#check for missing values

#--- incase null values are found then the decision need to be taken to drop the rows or the entire columns insurance_dataset.isnull().sum()

In [8]:

age 0
Out[8]:

sex 0
bmi 0
children 0
smoker 0
region 0
charges 0
dtype: int64

In [9]: #statistical Measures of the dataset

insurance_dataset .describe ()

Out[9]: age bmi children charges

count 1338.000000 1338.000000 1338.000000 1338.000000

mean 39.207025 30.663397 1.094918 13270.422265

std 14.049960 6.098187 1.205493 12110.011237

min 18.000000 15.960000 0.000000 1121.873900

25% 27.000000 26.296250 0.000000 4740.287150

50% 39.000000 30.400000 1.000000 9382.033000

75% 51.000000 34.693750 2.000000 16639.912515

max 64.000000 53.130000

5.000000 63770.428010

#insurance_dataset['age'].plot(kind='box', title='Insurance Info') labels =

['age', 'bmi', 'children']
B = plt.boxplot([insurance_dataset['age'], insurance_dataset['bmi'], insurance_dataset['children']], labels=labels) plt.show()

In [10]:

#distribution of 'age' value sns.set()

plt.figure(figsize=(6,6))
sns.distplot(insurance_dataset['age'])
plt.title('Age Distribution')
plt.show()

In [11]:

#gender column plt.figure(figsize=(6,6))

sns.countplot(x='sex',data=insurance_dataset)
plt.title('Sex Distribution')
plt.show()

In [12]:

Name: sex, dtype: int64

#BMI Distribution sns.set()

plt.figure(figsize=(6,6))
sns.distplot(insurance_dataset['bmi'])
plt.title('Age Distribution')
plt.show()
In [14]:

# children column plt.figure(figsize=(6,6))

sns.countplot(x='children',data=insurance_dataset)
plt.title('Children') plt.show()

In [15]:
1 324
2 240
3 157
4 25
5 18
Name: children, dtype: int64

# smoker column plt.figure(figsize=(6,6))

sns.countplot(x='smoker',data=insurance_dataset)
plt.title('Smoker') plt.show()

In [17]:
yes 274
Name: smoker, dtype: int64

#region column plt.figure(figsize=(6,6))

sns.countplot(x='region',data=insurance_dataset)
plt.title('Region') plt.show()

In [19]:
northwest 325
northeast 324
Name: region, dtype: int64

#distribution of 'charges' value

sns.set() plt.figure(figsize=(6,6))
sns.distplot(insurance_dataset['charges'])
plt.title('Charge Distribution') plt.show()

In [21]:

#encoding 'sex' column insurance_dataset.replace({'sex':{'male':0,'female':1}},inplace=True)

#encoding 'smoker' column insurance_dataset.replace({'smoker':{'yes':0,'no':1}},inplace=True)
#encoding 'region' column
insurance_dataset.replace({'region':{'southeast':0,'southwest':1,'northeast':2,'northwest':3}},inplace=True)

In [22]:

print(insurance_dataset)
In [23]:
age sex bmi children smoker region charges
0 19 1 27.900 0
0 1 16884.92400
1 18 0 33.770 1
1 0 1725.55230
2 28 0 33.000 3
1 0 4449.46200
3 33 0 22.705 0
1 3 21984.47061 4 32
0 28.880 0
1 3 3866.85520

In [13]: insurance_dataset ['sex' ].value_counts ()

Out[13]: male 676

female 662
... ... ... ... ... ... ... ...
1333 50 0 30.970 3
1 3 10600.54830
1334 18 1 31.920 0
1 2 2205.98080
1335 18 1 36.850 0
1 0 1629.83350
1336 21 1 25.800 0
1 1 2007.94500
1337 61 1 29.070 0
0 3 29141.36030

print(Y)

In [16]: insurance_dataset ['children' ].value_counts ()

Out[16]: 0 574

In [26]:

In [18]: insurance_dataset ['smoker' ] .value_counts ()

Out[18]: no 1064

In [20]: insurance_dataset ['region' ] .value_counts ()

Out[20]: southeast 364

southwest 325
0 16884.92400
1 1725.55230
2 4449.46200
3 21984.47061
4 3866.85520
...
10600.54830

1333
1334 2205.98080
1335 1629.83350
1336 2007.94500
1337 29141.36030

# R squared value for testing data r2_test=metrics.r2_score(Y_test,testing_data_prediction)

print("R-squared value",r2_test)

[1338 rows x 7 columns]

In [24]: X=insurance_dataset .drop (columns ='charges' ,axis =1)

Y=insurance_dataset ['charges' ]

print(X)
In [25]:

age sex bmi children smoker region

0 19 1 27.900 0 0 1
1 18 0 33.770 1 1 0
28 0 33.000 3 1 0
33 0 22.705 0 1 3
32 0 28.880 0 1 3
... ... ... ... ... ...
50 0 30.970 3 1 3
18 1 31.920 0 1 2
2
0 1 0
18 1 36.850
0 1 1
21 1 25.800
0 0 3
61 1 29.070

3
4
...

1333

1334
1335
1336

1337

[1338 rows x 6 columns]

input_data=(31,1,25.74,0,1,0)
#changing input_data to a numpy array input_data_array=np.asarray(input_data)
In [33]:

R-squared value 0.7447273869684077

In [36]:

[3760.0805765]
The insurance cost is USD 3760.0805764960514

In [ ]:

Chapter 1 A Letter To God Extract Based Questions For Class 10 First Flight
No ratings yet
Chapter 1 A Letter To God Extract Based Questions For Class 10 First Flight
10 pages
Vroom-Yetton-Jago: Deciding How To Decide
100% (1)
Vroom-Yetton-Jago: Deciding How To Decide
11 pages
C1 Advanced Speaking Part 1: Teacher's Notes
No ratings yet
C1 Advanced Speaking Part 1: Teacher's Notes
5 pages
Barnes-The Toils of Scepticism
100% (1)
Barnes-The Toils of Scepticism
88 pages
Lab Report of Strain Gauges and Load Cells PDF
No ratings yet
Lab Report of Strain Gauges and Load Cells PDF
11 pages
01 - Practical Guide To Ergonomics in Industrial Design
No ratings yet
01 - Practical Guide To Ergonomics in Industrial Design
65 pages
Data Perparation Penting
No ratings yet
Data Perparation Penting
12 pages
Ecology and Evolution Notes
No ratings yet
Ecology and Evolution Notes
3 pages
Project 3 - Diabetes Prediction - Ipynb - Colab
No ratings yet
Project 3 - Diabetes Prediction - Ipynb - Colab
4 pages
Kombolcha
No ratings yet
Kombolcha
22 pages
Wellness Habits for Veterinary Professionals
No ratings yet
Wellness Habits for Veterinary Professionals
2 pages
Syllabus Arch 353 Sec Sem.2024-2025
No ratings yet
Syllabus Arch 353 Sec Sem.2024-2025
4 pages
European Steel and Alloy Grades: 10crmo9-10 (1.7380)
No ratings yet
European Steel and Alloy Grades: 10crmo9-10 (1.7380)
3 pages
ML Lab Manual-Iso
No ratings yet
ML Lab Manual-Iso
40 pages
Health Risk Prediction
No ratings yet
Health Risk Prediction
80 pages
Theme: Living With COVID Sub Theme: Health and Well Being: CLASS - VII (2020-21) Project Based Assessment
No ratings yet
Theme: Living With COVID Sub Theme: Health and Well Being: CLASS - VII (2020-21) Project Based Assessment
4 pages
Week-01 B
No ratings yet
Week-01 B
4 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Adolescent Health and Wellbeing Current Strategies and Future Trends Direct Ebook Download
100% (9)
Adolescent Health and Wellbeing Current Strategies and Future Trends Direct Ebook Download
15 pages
Nonnative Speech Perception Insights
No ratings yet
Nonnative Speech Perception Insights
12 pages
Chapter 5 - Short-Term and Working Memory
100% (1)
Chapter 5 - Short-Term and Working Memory
32 pages
Immediate Access Marketing Management 4th Edition Marshall Verified PDF Download
0% (1)
Immediate Access Marketing Management 4th Edition Marshall Verified PDF Download
408 pages
RSER PI Manuscript
No ratings yet
RSER PI Manuscript
29 pages
Code and Outputs
No ratings yet
Code and Outputs
25 pages
Step 1
No ratings yet
Step 1
10 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
Introduction To Geography - Week 1 First Week: Physical Geography, Our World. Media
No ratings yet
Introduction To Geography - Week 1 First Week: Physical Geography, Our World. Media
21 pages
Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Mock Part1.ipynb - Colab
No ratings yet
Mock Part1.ipynb - Colab
10 pages
Machine Learning Algorithm 1690246024
No ratings yet
Machine Learning Algorithm 1690246024
26 pages
Vedant, Aiml
No ratings yet
Vedant, Aiml
63 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Data Pre-Processing
No ratings yet
Data Pre-Processing
22 pages
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
No ratings yet
AI-Ayesha Strategic Decision.d 55c8316b360ce0bf
20 pages
Math One Revision Booklet
No ratings yet
Math One Revision Booklet
121 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
CAD Mock Preparation
No ratings yet
CAD Mock Preparation
5 pages
Preprocessing1.ipynb - Colab
No ratings yet
Preprocessing1.ipynb - Colab
13 pages
TCS Lect 2 - 3 Basic Concepts
No ratings yet
TCS Lect 2 - 3 Basic Concepts
22 pages
Model2.ipynb - Colab
No ratings yet
Model2.ipynb - Colab
11 pages
Edp 3
No ratings yet
Edp 3
16 pages
Stroke Prediction
No ratings yet
Stroke Prediction
10 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Ai Yapping
No ratings yet
Ai Yapping
2 pages
Tandoc Et Al (2018) JCMC
No ratings yet
Tandoc Et Al (2018) JCMC
15 pages
Iraq's Biotech Patent Impact
No ratings yet
Iraq's Biotech Patent Impact
13 pages
Exhumation Report: Etafuleni Cemetery
No ratings yet
Exhumation Report: Etafuleni Cemetery
5 pages
Patient Data Management System
100% (1)
Patient Data Management System
27 pages
Medical Cost Analysis
No ratings yet
Medical Cost Analysis
17 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
'Name-Piyush Tiwari''/n' 'Section - C'/N' 'Roll - No-2001610100142'
No ratings yet
'Name-Piyush Tiwari''/n' 'Section - C'/N' 'Roll - No-2001610100142'
28 pages
General Physics 2 Module 2
No ratings yet
General Physics 2 Module 2
9 pages
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
No ratings yet
Effectiveness of Olive Oil Massage On Fatigue Among The Patients Undergoing Haemodialysis
5 pages
Exercises TECHNICAS ANALÍTICAS
No ratings yet
Exercises TECHNICAS ANALÍTICAS
23 pages
Mastery 2 (Etech)
No ratings yet
Mastery 2 (Etech)
4 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
5 pages
SML Lab 1
No ratings yet
SML Lab 1
19 pages
A4 Dsbda Sana
No ratings yet
A4 Dsbda Sana
16 pages
Prg7a - Jupyter Notebook
No ratings yet
Prg7a - Jupyter Notebook
12 pages
Baseline - Ipynb - Colab
No ratings yet
Baseline - Ipynb - Colab
5 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
12 pages
Medical Insurance Analysis ??
No ratings yet
Medical Insurance Analysis ??
17 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
ML Manual Final
No ratings yet
ML Manual Final
35 pages
Batch-2 Ieee DMT
No ratings yet
Batch-2 Ieee DMT
4 pages
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
No ratings yet
Step-By-Step-Diabetes-Classification-Knn-Detailed-Copy1 - Jupyter Notebook
12 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
DSBDA4
No ratings yet
DSBDA4
6 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
DSBDA2
No ratings yet
DSBDA2
6 pages
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
No ratings yet
Import As From Import From Import From Import From Import From Import From Import From Import From Import From Import From Import Import As
8 pages
Logistic Regression
No ratings yet
Logistic Regression
12 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Data Analysis and Visualization Guide
No ratings yet
Data Analysis and Visualization Guide
16 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Assignment On ANOVA
No ratings yet
Assignment On ANOVA
7 pages
Data Mining Lab: Regression & Clustering
No ratings yet
Data Mining Lab: Regression & Clustering
36 pages
Attiq Ahmad Afsar Mid Exam
No ratings yet
Attiq Ahmad Afsar Mid Exam
8 pages
02 B Regression Healthcare
No ratings yet
02 B Regression Healthcare
5 pages
02 B Regression Healthcare
No ratings yet
02 B Regression Healthcare
5 pages
Ass 1 Dsbda
No ratings yet
Ass 1 Dsbda
8 pages
Medical Cost Prediction
No ratings yet
Medical Cost Prediction
27 pages
Machine Learning Lab Guide
No ratings yet
Machine Learning Lab Guide
34 pages
Logistic Regression for Heart Disease
No ratings yet
Logistic Regression for Heart Disease
8 pages
Diabetis Project
No ratings yet
Diabetis Project
7 pages
3-LinearRegression Formula Based
No ratings yet
3-LinearRegression Formula Based
3 pages
Linear Regression on Insurance Data
No ratings yet
Linear Regression on Insurance Data
2 pages