0% found this document useful (0 votes)

238 views8 pages

Linear Regression with Scikit-Learn

This document outlines the steps to perform simple linear regression using scikit-learn on a dataset containing employee salary hike and churn rate data. It describes: 1) Importing packages and classes for linear regression. 2) Providing and exploring the dataset to use for regression. 3) Creating a linear regression model and fitting it to the dataset. 4) Performing various transformations on the data like log, exponential and polynomial to reduce errors and obtain the best fit model. 5) Choosing the best model based on the lowest RMSE value calculated for each transformation and reporting the final model results.

Uploaded by

Sravani Adapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

238 views8 pages

Linear Regression with Scikit-Learn

Uploaded by

Sravani Adapa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Simple Linear Regression With scikit-learn

There are five basic steps when you’re implementing linear regression:

1. Import the packages and classes you need.

2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
5. Apply the model for predictions.

These steps are more or less general for most of the regression approaches and implementations.

Problem Statement: -

A certain organization wanted an early estimate of their employee churn out rate. So, the HR
department came up with data regarding the employee’s salary hike and churn out rate for a
financial year. The analytics team will have to perform a deep analysis and predict an estimate
of employee churn and present the statistics. Approach –A Simple Linear regression model
needs to be built with target variable ‘Churn_out_rate’. Apply necessary transformations and
record the RMSE values, Correlation coefficient values for different transformation models.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:

import numpy as np
from sklearn.linear_model import LinearRegression
Now, you have all the functionalities you need to implement linear regression.

The fundamental data type of NumPy is the array type called numpy.ndarray. The rest of this article
uses the term array to refer to instances of the type numpy.ndarray.

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

regression and make predictions accordingly.

Step 2: Provide data

The second step is defining data to work with. The inputs (regressors, 𝑥) and output (predictor, 𝑦).

calories_consumed.csv is imported .
Exploratory data analysis is performed on data

Step 3: Create a model and fit it

The next step is to create a linear regression model and fit it using the existing data.

Let’s create an instance of the class LinearRegression, which will represent the regression model:

Simple linear regression

model = LinearRegression()
This statement creates the variable model as the instance of LinearRegression. You can provide several
optional parameters to LinearRegression

statsmodels.formula.api is imported to build a model based on ols of data

model1=smf.ols('calories ~ weight',data=cal_data).fit()

Regression line is plotted after obtaining predicted values

after plotting scattered plot root mean squared error is calculated

In order to reduce the errors and to obtain best fit line Transformation is performed on data

Log transformation

In exponential transformation, transformation is applied on y data

#x=log(weight),y=calories

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model2 is built on obtained data

new regression line is plotted

new rmse is calculated

Exponential transformation

In exponential transformation, transformation is applied on y data

#x=(weight),y=log(calories)

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model3 is built on obtained data

new regression line is plotted

new rmse is calculated

Polynomial transformation

x=s_hike ,x^2=s_hike*s_hike, y=log(churn)

from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression

new regression line

from the above regressive model the rmse is obtained

choose the best model by using all RMSE values of above transformations

models with respective RMS values are tabulated

from the above observations exp model is taken as best

Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works
satisfactorily and interpret it.

the summary of final model is

final model is fitted on train and test split data and prediction is observed

the final rmse value is

James2016 Book AnIntroductionToDataAnalysisUs
100% (2)
James2016 Book AnIntroductionToDataAnalysisUs
205 pages
Calibration Handbook of Measuring Instruments Excerpt
75% (4)
Calibration Handbook of Measuring Instruments Excerpt
49 pages
NumPy Notes
No ratings yet
NumPy Notes
13 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Linear Regression Techniques Explained
100% (1)
Linear Regression Techniques Explained
44 pages
PCA Using Python
No ratings yet
PCA Using Python
18 pages
Neural Networks for Advanced Learners
No ratings yet
Neural Networks for Advanced Learners
23 pages
Predictive Analytics
No ratings yet
Predictive Analytics
46 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Matplotlib Basics for Beginners
No ratings yet
Matplotlib Basics for Beginners
16 pages
Project 5 PDF
100% (1)
Project 5 PDF
48 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Understanding the Curse of Dimensionality
No ratings yet
Understanding the Curse of Dimensionality
9 pages
Supervised Learning 1 PDF
100% (1)
Supervised Learning 1 PDF
162 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Iot Systems - Logical Design Using Python: Bahga & Madisetti, © 2015
No ratings yet
Iot Systems - Logical Design Using Python: Bahga & Madisetti, © 2015
31 pages
Statistics Probability
No ratings yet
Statistics Probability
66 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Chapter 3: Data Preprocessing
100% (1)
Chapter 3: Data Preprocessing
41 pages
Python Data Analysis Guide
No ratings yet
Python Data Analysis Guide
75 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
100% (1)
Name: Reg. No.: Lab Exercise:: Shivam Batra 19BPS1131
10 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Name: Siti Mursyida Abdul Karim (Data Science Program) Topic: Assignment - EDA
100% (1)
Name: Siti Mursyida Abdul Karim (Data Science Program) Topic: Assignment - EDA
13 pages
Bagging and Boosting Regression Algorithms
100% (1)
Bagging and Boosting Regression Algorithms
84 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
IIIT-B Postgrad Assessment Guide
No ratings yet
IIIT-B Postgrad Assessment Guide
13 pages
Data Science Experiment Guide
100% (2)
Data Science Experiment Guide
43 pages
EDA Assignment
No ratings yet
EDA Assignment
15 pages
Data Mining Project: Clustering & Model Analysis
100% (1)
Data Mining Project: Clustering & Model Analysis
40 pages
Python Data Science
No ratings yet
Python Data Science
25 pages
Regression Notes
100% (1)
Regression Notes
20 pages
Introduction to Statistics Basics
100% (1)
Introduction to Statistics Basics
46 pages
Practical-5 - Jupyter Notebook
100% (1)
Practical-5 - Jupyter Notebook
8 pages
Stats & ML Model Comparisons
100% (1)
Stats & ML Model Comparisons
72 pages
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
No ratings yet
Answers To Problems For Data Mining and Predictive Analytics (2nd Edition) by Larose
12 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Predictive Modelling
100% (1)
Predictive Modelling
58 pages
Predicting Salary with Experience
100% (1)
Predicting Salary with Experience
7 pages
Assignment # 01 Bscs - 7 Semester: Machine Learning
100% (1)
Assignment # 01 Bscs - 7 Semester: Machine Learning
5 pages
Python Setup For Machine Learning
100% (1)
Python Setup For Machine Learning
3 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
9 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
3 pages
Predictive Modeling Project Report
100% (2)
Predictive Modeling Project Report
31 pages
Facets of Data
No ratings yet
Facets of Data
6 pages
Quiz Feedback1 - Coursera
100% (1)
Quiz Feedback1 - Coursera
7 pages
Logistic Regression
100% (1)
Logistic Regression
14 pages
Logistic Regression
100% (1)
Logistic Regression
29 pages
02 - Decision Tree Classification On Iris Dataset
No ratings yet
02 - Decision Tree Classification On Iris Dataset
6 pages
KPMG Data Analytics - Task 1
100% (1)
KPMG Data Analytics - Task 1
1 page
Data Analytics & R Programming: Decision Tree Algorithm
No ratings yet
Data Analytics & R Programming: Decision Tree Algorithm
10 pages
Linear Regression Chap01
100% (1)
Linear Regression Chap01
7 pages
SAS Presentation
No ratings yet
SAS Presentation
49 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Tutorial 2 - Clustering
100% (2)
Tutorial 2 - Clustering
6 pages
ML Project Report: (Text Learning Case Study)
No ratings yet
ML Project Report: (Text Learning Case Study)
9 pages
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
No ratings yet
Data Science Laboratory Lab Manual: Prepared by Dr. R Obulakonda Reddy, Associate Professor
35 pages
LPTHW
100% (1)
LPTHW
220 pages
Simple Linear Regression - Assign3
No ratings yet
Simple Linear Regression - Assign3
8 pages
Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component
100% (1)
Name:Silpa Batch Id: Analysis: WDEO 171220 Topic: Principal Component
7 pages
Simple Linear Regression - Assignn5
No ratings yet
Simple Linear Regression - Assignn5
8 pages
Simple Linear Regression - Assign4
No ratings yet
Simple Linear Regression - Assign4
8 pages
Simple Linear Regression - Assign2
No ratings yet
Simple Linear Regression - Assign2
9 pages
Gas Chromatography Basics
No ratings yet
Gas Chromatography Basics
34 pages
RU SOM PGDM 2020 2022 II Sem PGDM 410 Operation Research Final Exam Question Paper PDF
No ratings yet
RU SOM PGDM 2020 2022 II Sem PGDM 410 Operation Research Final Exam Question Paper PDF
4 pages
Ge3791 Hv&ethics
No ratings yet
Ge3791 Hv&ethics
7 pages
Differentiability in Topological Groups
No ratings yet
Differentiability in Topological Groups
10 pages
Survival in STATA
No ratings yet
Survival in STATA
2 pages
Jurnal - Assembly Drawing
No ratings yet
Jurnal - Assembly Drawing
10 pages
Mathematics - III
No ratings yet
Mathematics - III
22 pages
Numerical Analysis Final Exam
No ratings yet
Numerical Analysis Final Exam
2 pages
Project Development Cycle
No ratings yet
Project Development Cycle
2 pages
The Chain Rule
100% (1)
The Chain Rule
40 pages
Probability Eg
No ratings yet
Probability Eg
13 pages
BDA Mean Median Mode Questions
67% (3)
BDA Mean Median Mode Questions
12 pages
Hypothesis Testing Guide
No ratings yet
Hypothesis Testing Guide
15 pages
1 Laplace Transforms - Notes PDF
No ratings yet
1 Laplace Transforms - Notes PDF
16 pages
2 Introduction To Discourse Analysis
No ratings yet
2 Introduction To Discourse Analysis
10 pages
PHD Dissertation in Translation Studies
100% (2)
PHD Dissertation in Translation Studies
8 pages
Winter 2013 Syllabus MATH 1005
No ratings yet
Winter 2013 Syllabus MATH 1005
3 pages
Chapter 4 Hypothesis Testing1
No ratings yet
Chapter 4 Hypothesis Testing1
43 pages
Well Function Calculator Guide
No ratings yet
Well Function Calculator Guide
23 pages
Note: No Additional Answer Sheets Will Be Provided.: Probability and Statistics For Mechanical Engineering (Me)
No ratings yet
Note: No Additional Answer Sheets Will Be Provided.: Probability and Statistics For Mechanical Engineering (Me)
2 pages
MTH603 Spring - 2010 - FinalTerm - OPKST
No ratings yet
MTH603 Spring - 2010 - FinalTerm - OPKST
11 pages
Thermodynamic Calculus Manipulations
No ratings yet
Thermodynamic Calculus Manipulations
5 pages
(LTR) How Do You Interpret The Results
No ratings yet
(LTR) How Do You Interpret The Results
3 pages
MA103 Midterm 2014W Mock
No ratings yet
MA103 Midterm 2014W Mock
6 pages
Lec 1 Error Analysis
No ratings yet
Lec 1 Error Analysis
29 pages
Maths2 Blueprint PDF
No ratings yet
Maths2 Blueprint PDF
3 pages
Complex Analysis, Gamelin, II.6 Problems and Solutions
No ratings yet
Complex Analysis, Gamelin, II.6 Problems and Solutions
5 pages
Analytical Equipment Qualification
No ratings yet
Analytical Equipment Qualification
38 pages

Linear Regression with Scikit-Learn

Uploaded by

Linear Regression with Scikit-Learn

Uploaded by

Simple Linear Regression With scikit-learn

1. Import the packages and classes you need.

Step 1: Import packages and classes

The first step is to import the package numpy and the class LinearRegression from sklearn.linear_model:

The class sklearn.linear_model.LinearRegression will be used to perform linear and polynomial

Step 2: Provide data

Step 3: Create a model and fit it

Simple linear regression

statsmodels.formula.api is imported to build a model based on ols of data

Regression line is plotted after obtaining predicted values

after plotting scattered plot root mean squared error is calculated

In exponential transformation, transformation is applied on y data

scattered plot is plotted

later correlation coefficient is obtained between transformed input and output

model2 is built on obtained data

new regression line is plotted

In exponential transformation, transformation is applied on y data

scattered plot is plotted

model3 is built on obtained data

new regression line is plotted

new rmse is calculated

x=s_hike ,x^2=s_hike*s_hike, y=log(churn)

from sklearn.preprocessing import PolynomialFeatures to build the polynomial regression

new regression line

from the above regressive model the rmse is obtained

models with respective RMS values are tabulated

Step 4: Get results

the summary of final model is

the final rmse value is

You might also like