0% found this document useful (0 votes)

42 views6 pages

Linear Regression for Beginners

This document discusses preprocessing data for machine learning modeling. It loads and explores a dataset containing years of experience and salary for employees. It then splits the data into training and test sets, fits a simple linear regression model to the training set, makes predictions on the test set, and evaluates the model performance.

Uploaded by

Shreya Dutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views6 pages

Linear Regression for Beginners

Uploaded by

Shreya Dutta

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

# # Data Preprocessing

# Importing the libraries

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

from google.colab import files

uploaded = files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been executed in the current browser session. Please reru
enable.
Saving Salary_Data.csv to Salary_Data.csv

# Importing the dataset

dataset = pd.read_csv('Salary_Data.csv')
dataset
YearsExperience Salary

0 1.1 39343.0
dataset.describe()
1 1.3 46205.0

2 YearsExperience
1.5 37731.0 Salary

count
3 30.000000
2.0 43525.030.000000

mean
4 5.313333
2.2 76003.000000
39891.0

5std 2.837888
2.9 27414.429785
56642.0

6min 1.100000
3.0 37731.000000
60150.0

725% 3.200000
3.2 56720.750000
54445.0

850% 4.700000
3.2 65237.000000
64445.0

975% 7.700000
3.7 100544.750000
57189.0

max
10 10.500000
3.9 122391.000000
63218.0

11 4.0 55794.0
# Mounting Google Drive
12 4.0 56957.0
from google.colab import drive
drive.mount('/content/drive')
13 4.1 57081.0

14
Drive 4.5 at
already mounted 61111.0
/content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount

15 4.9 67938.0

16 5.1 66029.0
# Importing the dataset
17 = pd.read_csv('/content/drive/My
# dataset 5.3 83088.0 Drive/ATAL/Salary_Data.csv')

18 5.9 81363.0
---------------------------------------------------------------------------
FileNotFoundError
19 6.0 93940.0 Traceback (most recent call last)
<ipython-input-6-242e04d314aa> in <module>()
20 1 # Importing
6.8 the91738.0
dataset
----> 2 dataset = pd.read_csv('/content/drive/My Drive/ATAL/Salary_Data.csv')
21 7.1 98273.0
4 frames
22 7.9 101302.0
/usr/local/lib/python3.6/dist-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
23 2008 kwds["usecols"]
8.2 113812.0 = self.usecols
2009
->
24 2010 self._reader
8.7 109431.0= parsers.TextReader(src, **kwds)
2011 self.unnamed_cols = self._reader.unnamed_cols
25 2012 9.0 105582.0

26 9.5 116969.0
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__()
27 9.6 112635.0
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source()
28 10.3 122391.0
FileNotFoundError: [Errno 2] No such file or directory: '/content/drive/My Drive/ATAL/Salary_Data.csv'
29 10.5 121872.0
SEARCH STACK OVERFLOW

print(dataset)

YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0
5 2.9 56642.0
6 3.0 60150.0
7 3.2 54445.0
8 3.2 64445.0
9 3.7 57189.0
10 3.9 63218.0
11 4.0 55794.0
12 4.0 56957.0
13 4.1 57081.0
14 4.5 61111.0
15 4.9 67938.0
16 5.1 66029.0
17 5.3 83088.0
18 5.9 81363.0
19 6.0 93940.0
20 6.8 91738.0
21 7.1 98273.0
22 7.9 101302.0
23 8.2 113812.0
24 8.7 109431.0
25 9.0 105582.0
26 9.5 116969.0
27 9.6 112635.0
28 10.3 122391.0
29 10.5 121872.0

dataset.shape

(30, 2)

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30 entries, 0 to 29
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 YearsExperience 30 non-null float64
1 Salary 30 non-null float64
dtypes: float64(2)
memory usage: 608.0 bytes

# Extracting dependent and independent variables:

# Extracting independent variable:
X = dataset.iloc[:, :-1].values
# Extracting dependent variable:
y = dataset.iloc[:, 1].values

print(X)

[[ 1.1]
[ 1.3]
[ 1.5]
[ 2. ]
[ 2.2]
[ 2.9]
[ 3. ]
[ 3.2]
[ 3.2]
[ 3.7]
[ 3.9]
[ 4. ]
[ 4. ]
[ 4.1]
[ 4.5]
[ 4.9]
[ 5.1]
[ 5.3]
[ 5.9]
[ 6. ]
[ 6.8]
[ 7.1]
[ 7.9]
[ 8.2]
[ 8.7]
[ 9. ]
[ 9.5]
[ 9.6]
[10.3]
[10.5]]

print(y)

[ 39343 46205 37731 43525 39891 56642 60150 54445 64445 57189
63218 55794 56957 57081 61111 67938 66029 83088 81363 93940
91738 98273 101302 113812 109431 105582 116969 112635 122391 121872]

# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

print(X_train)

[[ 2.9]
[ 5.1]
[ 3.2]
[ 4.5]
[ 8.2]
[ 6.8]
[ 1.3]
[10.5]
[ 3. ]
[ 2.2]
[ 5.9]
[ 6. ]
[ 3.7]
[ 3.2]
[ 9. ]
[ 2. ]
[ 1.1]
[ 7.1]
[ 4.9]
[ 4. ]]

print(X_test)

[[ 1.5]
[10.3]
[ 4.1]
[ 3.9]
[ 9.5]
[ 8.7]
[ 9.6]
[ 4. ]
[ 5.3]
[ 7.9]]

print(y_test)

[ 37731 122391 57081 63218 116969 109431 112635 55794 83088 101302]

print(y_train)

[ 56642. 66029. 64445. 61111. 113812. 91738. 46205. 121872. 60150.

39891. 81363. 93940. 57189. 54445. 105582. 43525. 39343. 98273.
67938. 56957.]

# Fitting Simple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
▾ LinearRegression
y_predLinearRegression()
= regressor.predict(X_test)
#print("%2.f"%(y_pred))
print(y_pred)

[ 40835.10590871 123079.39940819 65134.55626083 63265.36777221

115602.64545369 108125.8914992 116537.23969801 64199.96201652
76349.68719258 100649.1375447 ]

# Visualising the Training set results

plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()

# Visualising the Test set results

plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_test, y_pred, color = 'blue')
plt.title('Salary vs Experience (Test set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
print("Regressor slope: %2.f "%( regressor.coef_[0]))
print("Regressor intercept:%2.f "% regressor.intercept_)

Regressor slope: 9346

Regressor intercept:26816

YearsExperience= 10
print("Salary for given Years of Experience is : %.f" %(regressor.predict([[YearsExperience]])))

Salary for given Years of Experience is : 120276

from sklearn import metrics

print("MAE %2.f" %(metrics.mean_absolute_error(y_test,y_pred)))

MAE 3426

from sklearn import metrics

print("RMSE %2.f" %(np.sqrt(metrics.mean_absolute_error(y_test,y_pred))))

RMSE 59

print('Train Score: %f' %(regressor.score(X_train, y_train)))

print('Test Score: %f' % (regressor.score(X_test, y_test)) )

Train Score: 0.938190

Test Score: 0.974915

Colab paid products - Cancel contracts here

Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Linear Regression for CPU Usage Prediction
No ratings yet
Linear Regression for CPU Usage Prediction
31 pages
Data Science with Python Tools
No ratings yet
Data Science with Python Tools
1 page
Data Pre Processing 1
No ratings yet
Data Pre Processing 1
35 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
Data Analysis for Beginners
No ratings yet
Data Analysis for Beginners
1 page
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Data Clustering for Analysts
No ratings yet
Data Clustering for Analysts
8 pages
Big Data Analysis
No ratings yet
Big Data Analysis
38 pages
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
No ratings yet
ML Mini Project: Name: Sarvesh Muttepwar Class: BE COMP (A) Roll No: 21CEBEB11
12 pages
Data Preprocessing & Visualization1
No ratings yet
Data Preprocessing & Visualization1
2 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
35 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
DP v8
No ratings yet
DP v8
19 pages
ML Cops
No ratings yet
ML Cops
17 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Preprocessing ch.1
No ratings yet
Preprocessing ch.1
24 pages
Statistical Data Analysis - Ipynb - Colaboratory
No ratings yet
Statistical Data Analysis - Ipynb - Colaboratory
6 pages
Assignment 03
No ratings yet
Assignment 03
6 pages
Programs of Python Pandas
No ratings yet
Programs of Python Pandas
15 pages
Practical No. 09.ipynb - Colab
No ratings yet
Practical No. 09.ipynb - Colab
4 pages
Keeraiit 2
No ratings yet
Keeraiit 2
19 pages
Data Frame Notes3
No ratings yet
Data Frame Notes3
39 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
ML Manual
No ratings yet
ML Manual
21 pages
Data Science Practical Problems
No ratings yet
Data Science Practical Problems
40 pages
Data Loading - Jupyter Notebook
No ratings yet
Data Loading - Jupyter Notebook
15 pages
k-7 Means
No ratings yet
k-7 Means
2 pages
KNN - Jupyter Notebook
No ratings yet
KNN - Jupyter Notebook
7 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
Pre-Processing Techniques - Ipynb - Colab
No ratings yet
Pre-Processing Techniques - Ipynb - Colab
3 pages
Howxtre
No ratings yet
Howxtre
8 pages
Introduction To Neural Networks
No ratings yet
Introduction To Neural Networks
4 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
ML Program 7, 8,9 And10
No ratings yet
ML Program 7, 8,9 And10
12 pages
EXP - 7 - Prasham Doshi - 22bec097
No ratings yet
EXP - 7 - Prasham Doshi - 22bec097
7 pages
DA Programs
No ratings yet
DA Programs
44 pages
ML 1-11
No ratings yet
ML 1-11
27 pages
Open Lab 2
No ratings yet
Open Lab 2
15 pages
Linear Reg 33
No ratings yet
Linear Reg 33
3 pages
DA Lab
No ratings yet
DA Lab
27 pages
ML LAB Manual-1
No ratings yet
ML LAB Manual-1
33 pages
Dsbda 3a
No ratings yet
Dsbda 3a
11 pages
A926534728 - 28953 - 8 - 2025 - Spark Mllib
No ratings yet
A926534728 - 28953 - 8 - 2025 - Spark Mllib
8 pages
Week 13 1-Pandas
No ratings yet
Week 13 1-Pandas
10 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
DS 5
No ratings yet
DS 5
2 pages
Merged
No ratings yet
Merged
35 pages
Train Test Splitting
No ratings yet
Train Test Splitting
3 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
ML FINAL Lab Manual
No ratings yet
ML FINAL Lab Manual
7 pages
Loan - Approval - Prediction - Ipynb - Colab
No ratings yet
Loan - Approval - Prediction - Ipynb - Colab
7 pages
Data Science Lab Program Printout
No ratings yet
Data Science Lab Program Printout
43 pages
Pandas Ds
No ratings yet
Pandas Ds
18 pages
Pandas
No ratings yet
Pandas
20 pages
Exercise 10
No ratings yet
Exercise 10
4 pages
Decision Tree PBEL With GridSearchCV
No ratings yet
Decision Tree PBEL With GridSearchCV
12 pages

Linear Regression for Beginners

Uploaded by

Linear Regression for Beginners

Uploaded by

# # Data Preprocessing

# Importing the libraries

from google.colab import files

# Importing the dataset

# Extracting dependent and independent variables:

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 1/3, random_state = 0)

[ 56642. 66029. 64445. 61111. 113812. 91738. 46205. 121872. 60150.

# Fitting Simple Linear Regression to the Training set

[ 40835.10590871 123079.39940819 65134.55626083 63265.36777221

# Visualising the Training set results

# Visualising the Test set results

# Visualising the Test set results

Regressor slope: 9346

Salary for given Years of Experience is : 120276

from sklearn import metrics

from sklearn import metrics

print('Train Score: %f' %(regressor.score(X_train, y_train)))

Train Score: 0.938190

Colab paid products - Cancel contracts here

You might also like