0% found this document useful (0 votes)

7 views8 pages

Polynomial Regression Blogpost

Uploaded by

Esmatullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Polynomial Regression Blogpost

Uploaded by

Esmatullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Polynomial Regression in Python

What is Polynomial Regression?

Polynomial Regression is a process by which given a set of inputs and their corresponding
outputs, we find an nth degree polynomial f(x) which converts the inputs into the outputs.
This f(x) is of the form:
2 n
f ( x )=a0 +a 1 x +a2 x … a n x
Polynomial regression has several advantages over linear regression because it can be used to
identify patterns that linear regression cannot. For example, if a ball is thrown upwards, we
apply a quadratic function to calculate the height of the ball over time. Also, cubic equations
are used to calculate planetary motion. These patterns cannot be identified using linear
regression.

Generating a random dataset

To do any Polynomial Regression, the first thing we need is data.
In the first part of this tutorial, we perform polynomial regression on a random, generated
dataset to understand the concepts. Then we will do the same on some real data.

Part 1: Using generated dataset

https://colab.research.google.com/drive/1_Xa5QG-HLPV8yxIOd5vD-dA6PHYAfvd8

We start by importing some libraries that we will be using in this tutorial.

1. import numpy as np
2. import matplotlib.pyplot as plt
3. import tensorflow as tf
4. import operator
5. from sklearn.metrics import mean_squared_error, r2_score
6. import pandas as pd

As you expect this creates random points with random coordinates. We can visualise this
using a scatter plot.
1. np.random.seed(0)
2. x = np.random.normal(0, 1, 20)
3. y = np.random.normal(0, 1, 20)
4.
5. plt.scatter(x,y, s=10)
6. plt.show()

Doing Polynomial Regression

We are doing Polynomial Regression using Tensorflow. We have to feed in the degree of the
polynomial that we want and the x data for this. The degree is an important feature that we
will be covering later. First, we have to modify the data so that it can be accepted by
tensorflow. Then we have to set some parameters like the optimizer and the loss function.
Finally, we train the model for 12000 steps / epochs.

1. deg=3
2. W = tf.Variable(tf.random_normal([deg,1]), name='weight')
3. #bias
4. b = tf.Variable(tf.random_normal([1]), name='bias')
5.
6. x_=tf.placeholder(tf.float32,shape=[None,deg])
7. y_=tf.placeholder(tf.float32,shape=[None, 1])
8.
9. def modify_input(x,x_size,n_value):
10. x_new=np.zeros([x_size,n_value])
11. for i in range(deg):
12. x_new[:,i]=np.power(x,(i+1))
13. x_new[:,i]=x_new[:,i]/np.max(x_new[:,i])
14. return x_new
15.
16. x_modified=modify_input(x,x.size,deg)
17. Y_pred=tf.add(tf.matmul(x_,W),b)
18.
19. #algortihm
20. loss = tf.reduce_mean(tf.square(Y_pred -y_ ))
21. #training algorithm
22. optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
23. #initializing the variables
24. init = tf.global_variables_initializer()
25.
26. #starting the session session
27. sess = tf.Session()
28. sess.run(init)
29.
30. epoch=12000
31.
32. for step in range(epoch):
33. _, c=sess.run([optimizer, loss], feed_dict={x_: x_modified, y_: y})
34. if step%1000==0 :
35. print ("loss: " + str(c))
36.
37. y_test=sess.run(Y_pred, feed_dict={x_:x_modified})

Finally we calculate the errors.

1. mse = np.sqrt(mean_squared_error(y,y_poly_pred))
2. r2 = r2_score(y,y_poly_pred)
3. print(mse)
4. print(r2)
1.1507521092081143
0.061440511342737425

Loss functions
We need to calculate how efficient our model is at capturing the patterns in the data. There
are 2 common ways of doing this:
1. Mean Square Error
2. R square score (R2 score)

Let us understand the math behind these two:

Mean Square Error:

For every value of x, we have the actual value of y and the value of y that our line predicts.
We find the difference between the two. Then we add the differences for each value of x.
Finally we divide this by the number of values of x.
( y 1− y '1 )+ ( y 2− y '2 ) … ( y n− y 'n )
n
This equation has a problem though. Some times the difference will be positive and other
times it will be negative. These values can cancel out and even though there may be large
errors the output will show that there is no error. So to tackle this problem, we square each
difference.
2 2
( y 1− y '1 ) + ( y 2− y '2 ) …( y n − y 'n)2
mse=
n

R2 score:
First we have to find the mean m of all the values of y:
y 1 + y 2+ y 3 … y n
m=
n
Then we get the difference between each value of y and the mean. We square each difference
and add them. Let this value be k.
2 2 2
( y 1−m ) + ( y 2−m ) …( y n−m)
k=
n
Now we divide the mse by k and subtract the result from 1. This gives us the R2 score. The
R2 score is a value between 0 and 1. A large R2 score means x correlates to y well and the
line can predict the y value well.
mse
R 2=1−
k
Let us see how this looks in code. There are some inbuilt functions that handle the
calculations for us:

1. mse = np.sqrt(mean_squared_error(y,y_pred))
2. r2 = r2_score(y,y_pred)
3. print(mse)
4. print(r2)
1.1832766119182259z
0.007636444138149345
Visualising the results
Now let us try to visualise the results.
First we find, the coefficients and the intercept of the quadratic equation generated.
1. print("Model paramters:")
2. print(sess.run(W))
3. print("bias:%f" %sess.run(b))
Model paramters:
[[ 1.1229055 ]
[-2.1566594 ]
[ 0.67295593]]
bias:0.128522
Using this we can find the equation itself

1. res = "y = f(x) = " + str(sess.run(b)[0])

2.
3. for i, r in enumerate(sess.run(W)):
4. res = res + " + {}*x^{}".format("%.2f" % r[0], i + 1)
5.
6. print (res)
y = f(x) = 0.088324 + 1.23*x^1 + -1.65*x^2
Finally, we can visualise the function by plotting it. We plot a line graph of the equation.

1. plt.scatter(x, y, s=10)
2. # sort the values of x before line plot
3. sort_axis = operator.itemgetter(0)
4. sorted_zip = sorted(zip(x,y_poly_pred), key=sort_axis)
5. x, y_poly_pred = zip(*sorted_zip)
6. plt.plot(x, y_poly_pred, color='red')
7. plt.show()

Part 2: Using real data

https://colab.research.google.com/drive/1S0wz7xquJ5-6MaREEMnxx-_HA7r6BVZw

In this part of the tutorial, we will be using some data (you can get it here:
https://github.com/SiddhantAttavar/PolynomialRegression/blob/master/
Position_Salaries.csv ) about position level and salary relationship in a company. As you can
see as the level increases, so does the salary. However, the relationship is not linear.
First we import the data.

1. # Importing the dataset

2. url = 'https://raw.githubusercontent.com/SiddhantAttavar/
PolynomialRegression/master/Position_Salaries.csv'
3. datas = pd.read_csv(url)
4. print(datas)
Position,Level,Salary
Business Analyst,1,45000
Junior
Consultant,2,50000
Senior
Consultant,3,60000
Manager,4,80000
Country Manager,5,110000
Region Manager,6,150000
Partner,7,200000
Senior Partner,8,300000
C-level,9,500000
CEO,10,1000000

The data is stored as a csv (comma separated values) file. In this file, each column is
separated by a comma, which makes it easy to read.
In this case the x values are the level column and the y values are the salary column. We
create the arrays using some functions in the pandas library.

1. X = datas.iloc[:, 1].values
2. Y = datas.iloc[:, 2].values
3. Y = Y[:, np.newaxis]

Now we can plot the data using a scatter plot.

1. plt.scatter(X, Y, s=10)
2. plt.show()

We can do Polynomial Regression for this data with degree 2. We will modify this later in the
course.

1. deg = 2 #@param {type:"slider", min:1, max:20, step:1}

2. W = tf.Variable(tf.random_normal([deg,1]), name='weight')
3. #bias
4. b = tf.Variable(tf.random_normal([1]), name='bias')
5.
6. X_=tf.placeholder(tf.float32,shape=[None,deg])
7. Y_=tf.placeholder(tf.float32,shape=[None, 1])
8.
9. X_modified=modify_input(X,X.size,deg)
10. Y_pred=tf.add(tf.matmul(X_,W),b)
11.
12. #algortihm
13. loss = tf.reduce_mean(tf.square(Y_pred -Y_ ))
14. #training algorithm
15. optimizer = tf.train.GradientDescentOptimizer(0.05).minimize(loss)
16. #initializing the variables
17. init = tf.global_variables_initializer()
18.
19. #starting the session session
20. sess = tf.Session()
21. sess.run(init)
22.
23. epoch=12000
24.
25. for step in range(epoch):
26. _, c=sess.run([optimizer, loss], feed_dict={X_: X_modified, Y_: Y})
27. if step%1000==0 :
28. print ("loss: " + str(c))
29. Y_test=sess.run(Y_pred, feed_dict={X_:X_modified})

Now, we can find how well our model is performing

1. rmse = np.sqrt(mean_squared_error(Y,lin_poly.predict(poly.fit_transform(X)))
)
2. r2 = r2_score(y,y_poly_pred)
3. print(rmse)
4. print(r2)
1.1507521216059198
0.06144049111930627
After this we visualise the results. First, we get the coefficients and print the formula and
then, we plot the equation

1. print("Model paramters:")
2. print(sess.run(W))
3. print("bias:%f" %sess.run(b))
4.
5. res = "y = f(x) = " + str(sess.run(b)[0])
6.
7. for i, r in enumerate(sess.run(W)):
8. res = res + " + {}*x^{}".format("%.2f" % r[0], i + 1)
9.
10. print (res)
11.
12. plt.scatter(X, Y, s=10)
13. # sort the values of x before line plot
14. sort_axis = operator.itemgetter(0)
15. sorted_zip = sorted(zip(X,Y_test), key=sort_axis)
16. X, Y_poly_pred = zip(*sorted_zip)
17. plt.plot(X, Y_poly_pred, color='red')
18. plt.show()

Lastly we predict what the salary for level 11 would be.

1. # Predicting a new result with Polynomial Regression
2. lin_poly.predict(poly.fit_transform([[11.0]]))[0]
3. 1121833.333333334

Overfitting
Under-fitting and over-fitting are 2 things that you must always try to avoid.
Under-fitting is when your model is not able to recognise the relationship between the 2
quantities. For example it may be found when trying to apply a linear model to a quadratic
relationship. Common symptoms of this are high MSE and low R2 score.

On the other hand, overfitting is also a common issue. The model performs very well on the
training data, but fails to perform on new, unseen data. In this case, the curve generated
passes through all or nearly all the datapoints. The model fails to understand the overall
pattern and cannot generalize.

There are 2 ways of eliminating these problems:

1. Providing more data: If you provide more data, the model is more likely to identify
the general pattern
2. Finding the correct degree for the polynomial

Finding the correct degree

In our position vs salary example, we have very limited data, so adding more data is not
possible. We have to find the correct degree. Here is a table of some degrees, their graph,
their MSE, and their R2 score
Degree Graph MSE R2 score Equation
0
1 163388.73 0.66 f (x)=−195333.33+0.00∗x + 80

0
2 82212.12 0.91 f ( x)=232166.66+0.00∗x +−13

0
3 38931.5 0.981209 f ( x)=−121333.33+0.00∗x + 18
7727913
367

5 4047.5 0.999796 f ( x)=−41333.33332994394 +0.

9027099
755
0
10 0.0008 1.0 f (x)=−44796.69+5.04∗x +146

Over here the linear polynomial is an underfit since it fails to capture the pattern. It also has a
high MSE and a low R2 score.

The degree 5 and degree 10 polynomials overfit the data. They have high scores. In fact the
degree 10 polynomial has a R2 score of 1, which is the best possible. However, given data
slightly off the curve, they will not be able to generalize.

The degree 2 and degree 3 polynomials are a good fit as they capture the pattern but do not
overfit. Note that in general the best fits usually do not have a degree greater than 3.

Summary
In this tutorial, we learnt the following concepts:
1. Linear Regression
2. Generating datasets
3. Mean Square Error
4. R2 score
5. Polynomial Regression
6. Plotting scatter plots and line graphs
7. Importing datasets from csv files
8. Overfitting and underfitting

Hope you are able to use these concepts in your own projects.

Math Project Work
No ratings yet
Math Project Work
10 pages
Intro to Statistics Basics
No ratings yet
Intro to Statistics Basics
89 pages
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
No ratings yet
ML-Lab07-Building and Evaluating Multivariate Regression Models in Python
5 pages
Chapter 4
No ratings yet
Chapter 4
42 pages
Confidence Interval Practice Exam
No ratings yet
Confidence Interval Practice Exam
12 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
8 pages
UNIT-1 Polynomial Regression
No ratings yet
UNIT-1 Polynomial Regression
7 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Mltee t5 Assignment Pseudo Code
No ratings yet
Mltee t5 Assignment Pseudo Code
10 pages
Z Table
0% (1)
Z Table
1 page
MLDAP Module2
No ratings yet
MLDAP Module2
32 pages
Polynomial Regression Guide
No ratings yet
Polynomial Regression Guide
14 pages
ML Polynomial Regression4
No ratings yet
ML Polynomial Regression4
36 pages
Practical 8
No ratings yet
Practical 8
5 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Assaignment 1 2
No ratings yet
Assaignment 1 2
3 pages
Psychometric Test Properties Guide
No ratings yet
Psychometric Test Properties Guide
44 pages
Quantitative Methods Fairview Branch PDF
100% (1)
Quantitative Methods Fairview Branch PDF
82 pages
Polynomial Regression
No ratings yet
Polynomial Regression
9 pages
Python Linear Regression Guide
No ratings yet
Python Linear Regression Guide
23 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
ML All Projectpdf Removed
No ratings yet
ML All Projectpdf Removed
41 pages
Assignment 5
No ratings yet
Assignment 5
9 pages
Assignment 1
No ratings yet
Assignment 1
5 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
Pratima Education® 9898168041: D. Ratio
No ratings yet
Pratima Education® 9898168041: D. Ratio
68 pages
Mayhoc
No ratings yet
Mayhoc
51 pages
Machine Learning (CSEN3203) 1-14
No ratings yet
Machine Learning (CSEN3203) 1-14
15 pages
Linear Regression for Data Science Students
No ratings yet
Linear Regression for Data Science Students
21 pages
Naive Bayes
No ratings yet
Naive Bayes
58 pages
Regression Model
No ratings yet
Regression Model
6 pages
Ai Lab
No ratings yet
Ai Lab
19 pages
C2W3 Lab 01 Model Evaluation and Selection
No ratings yet
C2W3 Lab 01 Model Evaluation and Selection
21 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
C1 W2 Lab02 Multiple Variable Soln
No ratings yet
C1 W2 Lab02 Multiple Variable Soln
11 pages
Niraj DL
No ratings yet
Niraj DL
15 pages
ML Regression for Data Scientists
No ratings yet
ML Regression for Data Scientists
7 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
2.3 ML (Implementation of Polynomial Regression Using Python)
No ratings yet
2.3 ML (Implementation of Polynomial Regression Using Python)
9 pages
Ai Last 5
No ratings yet
Ai Last 5
4 pages
Task 1
No ratings yet
Task 1
5 pages
Python Simple Linear Regression Guide
No ratings yet
Python Simple Linear Regression Guide
14 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
5 pages
Bayesian Inference For Partially Identified Models Exploring The Limits of Limited Data 1st Edition Complete EPUB Ebook
100% (19)
Bayesian Inference For Partially Identified Models Exploring The Limits of Limited Data 1st Edition Complete EPUB Ebook
14 pages
Linear Regression Lab Guide
No ratings yet
Linear Regression Lab Guide
5 pages
Assignment No.4 - (20-Ele-68)
No ratings yet
Assignment No.4 - (20-Ele-68)
17 pages
02 02 Regression
No ratings yet
02 02 Regression
11 pages
10 Polynomial Regression
No ratings yet
10 Polynomial Regression
16 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
C1 W1 Lab02 Model Representation Soln
No ratings yet
C1 W1 Lab02 Model Representation Soln
7 pages
Take It Easy: Created Status Last Read
No ratings yet
Take It Easy: Created Status Last Read
55 pages
ML Interview Questions
No ratings yet
ML Interview Questions
10 pages
Nikita Prasad Polynomial Regression Basics 1710359781
No ratings yet
Nikita Prasad Polynomial Regression Basics 1710359781
16 pages
Polynomial Regression in Python
No ratings yet
Polynomial Regression in Python
6 pages
Machine Learning Lab (3) Report (21 CP 81)
No ratings yet
Machine Learning Lab (3) Report (21 CP 81)
7 pages
AI Regression & Classification Guide
No ratings yet
AI Regression & Classification Guide
47 pages
B.Tech AI & DS: Data Science Lab
No ratings yet
B.Tech AI & DS: Data Science Lab
35 pages
Jerome en
No ratings yet
Jerome en
10 pages
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
No ratings yet
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
1 page
2 Linear Regression
No ratings yet
2 Linear Regression
5 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
Final Exam Stat 2019 - 2020
100% (1)
Final Exam Stat 2019 - 2020
5 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
ANN PR Code and Output
No ratings yet
ANN PR Code and Output
25 pages
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
No ratings yet
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
4 pages
Cl-Vii Ass2 4301063
No ratings yet
Cl-Vii Ass2 4301063
5 pages
Inferential Stats Assignment Guide
No ratings yet
Inferential Stats Assignment Guide
4 pages
Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
No ratings yet
CIS 419/519 Introduction To Machine Learning Assignment 2: Instructions
12 pages
DEJI
No ratings yet
DEJI
31 pages
Econometrics Chapter 4
No ratings yet
Econometrics Chapter 4
5 pages
Ch15S - Sampling For TOC & STOT 2020
No ratings yet
Ch15S - Sampling For TOC & STOT 2020
13 pages
Teknik Sampling
No ratings yet
Teknik Sampling
10 pages
Wa0002.
No ratings yet
Wa0002.
5 pages
Econometricstrix Meeting 2020 December
No ratings yet
Econometricstrix Meeting 2020 December
5 pages
Baayen Et Al. - Analyzing Reaction Times
No ratings yet
Baayen Et Al. - Analyzing Reaction Times
17 pages
R Programming Unit 4
No ratings yet
R Programming Unit 4
26 pages
Linear Regression Analysis Summary
No ratings yet
Linear Regression Analysis Summary
6 pages
Position of Fovea Palatinae Relative To The Vibrating Line in Various Soft Palate Classifications Among Jordanian Edentulous Population
No ratings yet
Position of Fovea Palatinae Relative To The Vibrating Line in Various Soft Palate Classifications Among Jordanian Edentulous Population
9 pages
Approaches For Credit Scorecard Calibration: An Empirical Analysis
No ratings yet
Approaches For Credit Scorecard Calibration: An Empirical Analysis
40 pages
Lec 1
No ratings yet
Lec 1
54 pages
Woreda-Level Crop Production Rankings in Ethiopia:: A Pooled Data Approach
No ratings yet
Woreda-Level Crop Production Rankings in Ethiopia:: A Pooled Data Approach
43 pages
Hypothesis Testing Part 1
No ratings yet
Hypothesis Testing Part 1
31 pages
Experimental Psychology Chapter 5 Flashcards - Quizlet
No ratings yet
Experimental Psychology Chapter 5 Flashcards - Quizlet
2 pages
Dap Mid
No ratings yet
Dap Mid
9 pages
ML37 de Thi
No ratings yet
ML37 de Thi
7 pages
Math 2311: Introduction To Probability and Statistics Course Syllabus
No ratings yet
Math 2311: Introduction To Probability and Statistics Course Syllabus
3 pages
P Chart
No ratings yet
P Chart
3 pages

Polynomial Regression Blogpost

Uploaded by

Polynomial Regression Blogpost

Uploaded by

Polynomial Regression in Python

What is Polynomial Regression?

Generating a random dataset

Part 1: Using generated dataset

We start by importing some libraries that we will be using in this tutorial.

Doing Polynomial Regression

Finally we calculate the errors.

Let us understand the math behind these two:

Mean Square Error:

1. res = "y = f(x) = " + str(sess.run(b)[0])

Part 2: Using real data

1. # Importing the dataset

Now we can plot the data using a scatter plot.

1. deg = 2 #@param {type:"slider", min:1, max:20, step:1}

Now, we can find how well our model is performing

Lastly we predict what the salary for level 11 would be.

There are 2 ways of eliminating these problems:

Finding the correct degree

5 4047.5 0.999796 f ( x)=−41333.33332994394 +0.

You might also like