0% found this document useful (0 votes)

19 views30 pages

Correlation & Regression Guide

Uploaded by

CH064PATEL RUDRA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views30 pages

Correlation & Regression Guide

Uploaded by

CH064PATEL RUDRA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Correlation

&
Regression
Dr. Anand Tiwari
Department of Chemical Engineering
Dharmsinh Desai University
Content
Topics COs
1. Data Definitions and Working with data CO1
Qualitative and Quantitative data, Discrete and Continuous Data, Frequency Measures,
Graphical Representation Collection and summarizing data
2. Probability and Distribution CO2
Probability Distribution, Random variables and sampling, Normal Distribution,
3. Statistical Inference CO3
Central Tendencies, Confidence intervals, test of hypothesis
4. Regression and Correlation CO4
Single and multiple linear regression, Non-linear regression, relation between correlation and
regression, Goodness of fitting
5. Analysis of Variance – ANOVA CO5
Logic behind ANOVA, Single Factor and Two factor ANOVA
Experimental Data
Overview

Error analysis and propagation

Data Correlation

Data Regression
Sources of Errors
• Sources of errors can be classified in 3 broad groups –
• Errors of Measurement
• Errors of measurement are due to physical limitations of scaling. Without a Vernier
attachment, a length of 2 cm could easily be in error by 0.5 mm, which is 2.5%.
• Precision Errors
• Precision errors are the built-in errors of the apparatus. For example, the scales on
mercury in a glass thermometer or a burette or pipette assume that the bore is uniform.
Which assumes a change of 1 unit is constant throughout.
• Errors of Method
• Errors of method include faults such as neglecting heat losses, assuming constant
overflow, or neglecting back-mixing in a tubular reactor etc.
• All these errors can only be estimated and rarely measured
• An error is usually given up to only one or two significant figures.
Terminologies
• Accuracy, Error, Precision, and Uncertainty
• All measured quantities are subject to uncertainties
• The variability in the results of repeated measurements arises
because the absolute values are impossible to reproduce
• The result of any physical measurement has
two essential components:
• numerical value (in a specified system of units) giving the best
estimate possible of the quantity measured
• degree of uncertainty associated with this estimated value
• Errors can be thought of as issues with equipment or methodology that
cause a reading to be different from the true value
• The uncertainty is a range of values around a measurement within
which the true value is expected to lie, and is an estimate
• True Value cannot be absolutely determined. Trueness is the
closeness of agreement between the average value obtained
from a large series of test results and the accepted true. Trueness
is largely affected by systematic error
• Accuracy is the closeness of agreement between a measured
value and the true value. Accuracy is an expression of the lack
of error.
• Precision is the closeness of agreement between independent
measurements of a quantity under the same conditions.
Precession is largely affected by random error
• Uncertainty characterizes the range of values within which the
true value is asserted to lie with some level of confidence. For
example concentration measured C = 2.6  0.15 mol/L
• Error is the difference between the true value and the measured
value. The total error is a combination of both systematic error
and random error.
8
Errors
• Experimental data is subject to various errors hence any calculations based on
that is having limited accuracy
• The way in which errors accumulate/propagate is governed by different
operations being performed (addition, subtraction, division or multiplication)
• Absolute Error - Numerical difference between true value and
approximated/estimated value
• If x is the experimental reading (true value), then the error in the x is -
𝜀𝐴 = 𝛿𝑥 = 𝑥 − 𝑥 ′ ■ Absolute accuracy ∆𝑥 = 𝑥 − 𝑥 ′
𝛿𝑥
■ Relative error is given as 𝜀𝑅 = ′ ∆𝑥 ∆𝑥
𝑥 ■ Relative accuracy ≈ ′
𝑥 𝑥
■ Percentage error is given as 𝜀𝑃 = 100𝜀𝑅
Error Propagation
■ Suppose z is estimated from two measured variables x and y having uncertainty
± ∆𝑥 and ±∆𝑦

𝒛=𝒙±𝒚 ∆𝒛 = (∆𝒙)𝟐 +(∆𝒚)𝟐

■ Propagation through multiplication/division 𝒛 = 𝒙𝒚 𝒛 = 𝒙/𝒚

𝟐 𝟐
∆𝒛 ∆𝒙 ∆𝒚
= +
𝒛 𝒙 𝒚
Error Propagation
■ Propagation through Exponents 𝒛 = 𝒙𝒎 𝒚𝒏

𝟐 𝟐
∆𝒛 ∆𝒙 ∆𝒚
= 𝒎 + 𝒏
𝒛 𝒙 𝒚
■ Example: Suppose you want to estimate the specific heat of a iron rod for a
given measured values. What would be the uncertainty in estimation of
Specific Heat?

Measured Value Unit Uncertainty

𝑞 = 𝑚𝐶𝑝 𝑇2 − 𝑇1
T2 = 45.6 C 0.1
T1 = 40.4 C 0.1
𝑞
𝐶𝑝 =
(T2-T1) = C 𝑚 𝑇2 − 𝑇1
I= 10.2 A 0.02
V= 13.5 V 0.1 𝑞 = 𝑉𝐼𝑡
t= 125 s 1
m= 0.72 kg 0.01
d= 3.38 cm 0.03
h= 10.2 cm 0.03
Coefficient of Correlation
• Coefficient of correlation (r) – It is a measure of the strength of the
linear relationship between two variables x and y in the sample.
• The correlation coefficient r is scaleless. The value of r is always
between and -1 & +1, no matter what the units of x and y are.
• This is also called Pearson correlation coefficient
• High correlation does not imply causality. If a large positive or
negative value of the sample correlation coefficient r is observed, it is
incorrect to conclude that a change in x causes a change in y.
• When r is near or equal to 0
implies little or no linear
relationship between y and x.
• The closer r is to 1 or -1, the
stronger the linear relationship
between y and x. And, if +1 or -
1, all the points fall exactly on
the least-squares line.
• Positive values of r imply that y
increases as x increases;
negative values imply that y
decreases as x increases
assumptions
Data must meet following criteria to use Pearson’s Correlation -
• Both variables are on an interval or ratio level of measurement
• Data from both variables follow normal distributions
• Your data have no outliers
• Your data is from a random or representative sample
• You expect a linear relationship between the two variables
𝑦𝑖 = 𝑎 + 𝑏𝑥𝑖 ± 𝜀𝑖
𝜀𝑖 = 𝑦𝑖 − 𝑦ො𝑖
Minimize 𝑛

𝑆𝑆𝐸 = ෍ 𝑦𝑖 − 𝑎 + 𝑏𝑥𝑖 2

𝑖=1

𝜕𝑆𝑆𝐸 𝜕𝑆𝑆𝐸
𝜕𝑎 𝜕𝑏
Equate the differentials to Zero and Solve for a & b

1
𝑆𝑆𝑥𝑦 = ෍ 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത = ෍ 𝑥𝑖 𝑦𝑖 − ෍ 𝑥𝑖 ෍ 𝑦𝑖
𝑛

2
1
𝑆𝑆𝑥𝑥 = ෍ 𝑥𝑖 − 𝑥ҧ 2 = ෍ 𝑥𝑖2 − ෍ 𝑥𝑖
𝑛
• Compression vs. Pressure for a material used in pressure vessel

Pressure x Compression y Calculate the Coefficient of Correlation

(kg/m2) (mm)
1 25.4 𝑆𝑆𝑥𝑦 = ෍ 𝑥𝑖 − 𝑥ҧ 𝑦𝑖 − 𝑦ത
2 26.2
1
3 51 = ෍ 𝑥𝑖 𝑦𝑖 − ෍ 𝑥𝑖 ෍ 𝑦𝑖
𝑛
4 53.6
5 101.2 2
1
𝑆𝑆𝑥𝑥 = ෍ 𝑥𝑖 − 𝑥ҧ 2
= ෍ 𝑥𝑖2 − ෍ 𝑥𝑖
𝑛
2
1
𝑆𝑆𝑦𝑦 = ෍ 𝑦𝑖 − 𝑦ത 2
= ෍ 𝑦𝑖2 − ෍ 𝑦𝑖
𝑛
Coefficient of Determination
• A way to measure the contribution of x in predicting y is to consider
how much the errors of prediction of y can be reduced by using the
information provided by x.
• If x contributes little or no information for the prediction of y, then the
sums of squares of deviations for the two lines then 𝑆𝑆𝑦𝑦 = 𝑆𝑆𝐸
• If x does contribute information for the prediction of y, then SSE will
be smaller than 𝑆𝑆𝑦𝑦 . In fact, if all the points fall on the least-squares
line, then SSE = 0
• It is represented as r2 given as measure of fitting or prediction.
Spearman Rank Correlation
• Most common alternative to Pearson’s r. It uses the rankings of data
from each variable (e.g., from lowest to highest) rather than the raw
data itself.
• It is used when the assumptions of Pearson Correlation does not meet
i.e. when distribution is not normal or non-linearity is present.
6 σ 𝑑𝑖2
𝑟𝑠 = 1 − 3
𝑛 −𝑛
• rs= strength of the rank correlation between variables
• di = the difference between the x-variable rank and the y-variable rank
for each pair of data
• σ 𝑑𝑖2 = sum of the squared differences between x- and y-variable ranks
• n = sample size
• Work-Hours Missed, Annual Wages, for 15 Employees
Data Regression – Curve Fitting
• Provides the method to represent the set of experimental data in the
form of empirical equations.
• Linear regression applied to any function that is linear in the coefficients
• Non-linear regression applied to any function that is non-linear in the
coefficients
• Different methods for regression are –
• Analytical Method
• Method of Averages
• Method of Least Squares
Method of Averages
• Suppose nine experimental values of a variable Z are available at
nine different known values of x, and the best-fit curve of the type
𝒁 = 𝑨 + 𝑩𝒙 + 𝑪𝒆𝒙

■ Number of parameters – 3
■ Number of equations needed – 3
• For method of averages, the following points must be observed:
1. Points must be arranged in ascending values of independent variable x.
2. The number of groups must equal the number of unknown parameters.
3. Groups should contain approximately equal number of points:
• 9 points → 3 points in each group
• 8 points → 3 points + 2 points + 3 points
• 10 points → 3 points + 4 points + 3 points
4. Each experimental point should be used only once.
5. The appropriate average must be taken.
Method of least squares
𝑦 = 𝑚𝑥 + 𝑐 𝑦 − 𝑚𝑥 − 𝑐 = 𝑅 𝑧 = 𝑦 − 𝑚𝑥 − 𝑐 2 = 𝑅2

𝜕𝑧 𝜕𝑧
=0 =0
𝜕𝑚 𝜕𝑐

𝑁 σ 𝑥𝑛 𝑦𝑛 − σ 𝑥𝑛 σ 𝑦𝑛 σ 𝑥𝑛2 σ 𝑦𝑛 − σ 𝑥𝑛 𝑦𝑛 σ 𝑥𝑛
𝑚= 𝑐=
𝑁 σ 𝑥𝑛2 − σ 𝑥𝑛 2 𝑁 σ 𝑥𝑛2 − σ 𝑥𝑛 2
Linear regression
■ Used for function that is linear in the coefficients the actual function may be any

𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2 + 𝑑𝑥 3
𝑦 = 𝑎𝑒 𝑥
■ The least-square is used for best fit that minimizes the sum of the squares of the y
deviations of individual points from the line
■ Correlation coefficient, R, is used as a measure of the correlation between x and y,
where −1 ≤ 𝑅 ≤ 1
■ R2 is used ranging from 0 to 1 as a measure of goodness of fit. The preferable value is >
0.9 defined as good fit.

■ Multiple linear regression fits data to a model that defines y as a function of two or
more independent x variables
𝑦 = 𝑎𝑇 + 𝑏𝑃 + 𝑐𝑡 + 𝑑
Non-linear Regression
■ Parameters are linear in coefficient 𝑦 = 𝑎𝑒 ±𝑏𝑥
■ Equations that cannot be converted into a linear form and are said to be
intrinsically nonlinear
■ Chemical kinetics - a system of two consecutive first-order reactions

𝑘1 𝑘2 𝑘1
𝐴՜𝐵՜𝐶 𝐶𝐵 = 𝐶𝐴0 𝑒 −𝑘1𝑡 − 𝑒 −𝑘2𝑡
𝑘2 − 𝑘1

■ There are no analytical expressions to obtain the set of regression

coefficients for a fitting function that is nonlinear in its coefficients
■ Approach is to guess the values of parameters and find the solution that
matches closely with measured data.
■ Goal is to minimize sum of square error
2
𝑆𝑆𝐸 = ෍ 𝐶𝐵_𝐶𝑎𝑙𝑐 − 𝐶𝐵_𝑀𝑒𝑠

■ With trial and error best guess in parameters are made such that SSE is
minimum
Problem-1
Nu Re Pr
24.8 7000 0.46 Fit and find out the parameters a, m and n
28.5 7600 0.63
𝑁𝑢 = 𝑎𝑅𝑒 𝑚 𝑃𝑟 𝑛
60.3 12000 4.2
58.4 11700 3
84.5 14300 10 Use both method of averages and least squares
115 17000 18.6
170 19000 41
193 20100 58.5
140 17900 32
189 19700 70.3
315 25000 185
480 29300 590
Problem-2
• The data tabulated below for an experiment to determine the growth rate of
bacteria k per day as a function of oxygen concentration C (mg/L).
Following model is proposed for the growth rate

Estimate Cs and kmax and compare the parameters using methods of

averages and method of least square. What would be growth rate at C =
2mg/L?

Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
10 pages
Experiment 1 Lab Report
No ratings yet
Experiment 1 Lab Report
10 pages
Panel Data Analysis in R
No ratings yet
Panel Data Analysis in R
1 page
Treatment of Experimental Data: Prof. Anand Kumar Tiwari DDIT, Nadiad
No ratings yet
Treatment of Experimental Data: Prof. Anand Kumar Tiwari DDIT, Nadiad
19 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
Error Analysis: Physical Chemistry Laboratory
No ratings yet
Error Analysis: Physical Chemistry Laboratory
3 pages
Lesson 3 Lab Stats and QC
No ratings yet
Lesson 3 Lab Stats and QC
81 pages
Regression PDF
No ratings yet
Regression PDF
18 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
12.1correlation and Simple Linear
No ratings yet
12.1correlation and Simple Linear
45 pages
Regression Lines
No ratings yet
Regression Lines
11 pages
Errors and Uncertainties 2017-P
No ratings yet
Errors and Uncertainties 2017-P
50 pages
03 Statistics in Analytical Chemistry
No ratings yet
03 Statistics in Analytical Chemistry
92 pages
Lecture 7 - Correlation Regression
No ratings yet
Lecture 7 - Correlation Regression
47 pages
UNIT-3: Correlation and Regression Analysis
No ratings yet
UNIT-3: Correlation and Regression Analysis
3 pages
Errors, Standard Deviation, Data Analysis
No ratings yet
Errors, Standard Deviation, Data Analysis
29 pages
Regression and Correlation
No ratings yet
Regression and Correlation
13 pages
Stats 4
No ratings yet
Stats 4
23 pages
Chapter 7 Correlation and Regression Lyst5582
No ratings yet
Chapter 7 Correlation and Regression Lyst5582
13 pages
Correlation
No ratings yet
Correlation
22 pages
CH248-L7 - (Method Limitation and Data Handling) New
No ratings yet
CH248-L7 - (Method Limitation and Data Handling) New
42 pages
02 Correlation Coefficient and The Residual
No ratings yet
02 Correlation Coefficient and The Residual
10 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
No ratings yet
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
31 pages
Meeting 10 - Statistics 2025
No ratings yet
Meeting 10 - Statistics 2025
34 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
15 pages
Analytical Data Evaluation Guide
No ratings yet
Analytical Data Evaluation Guide
39 pages
5.chap 5 S&P
No ratings yet
5.chap 5 S&P
3 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Statistical Data Analysis: Analytical Chemistry LAB 01/26/2019
No ratings yet
Statistical Data Analysis: Analytical Chemistry LAB 01/26/2019
4 pages
Basic Statistical Analysis
No ratings yet
Basic Statistical Analysis
80 pages
Correlation
No ratings yet
Correlation
6 pages
Topic 13 Correlation and Simple Linear Regression
No ratings yet
Topic 13 Correlation and Simple Linear Regression
17 pages
Module 10 (Correlation and Regression)
No ratings yet
Module 10 (Correlation and Regression)
7 pages
Regresión y Calibración
No ratings yet
Regresión y Calibración
6 pages
Bivariate Data
No ratings yet
Bivariate Data
17 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Correlation - Linear - Logistic Regression
No ratings yet
Correlation - Linear - Logistic Regression
123 pages
31 Mathematics Correlation Regression
No ratings yet
31 Mathematics Correlation Regression
9 pages
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5 PDF
146 pages
Understanding Correlation & Determination
No ratings yet
Understanding Correlation & Determination
3 pages
Module 3 Statistical Methods
No ratings yet
Module 3 Statistical Methods
25 pages
Title: Regression and Correlation: Mathematics Support Centre
No ratings yet
Title: Regression and Correlation: Mathematics Support Centre
2 pages
IV - Measures of Relationship
100% (1)
IV - Measures of Relationship
4 pages
19 - Correlation and Regression
No ratings yet
19 - Correlation and Regression
7 pages
Correlation
100% (1)
Correlation
29 pages
Correlation, Regression & Curve Fitting
No ratings yet
Correlation, Regression & Curve Fitting
6 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
17 pages
Sangam's Maths Project
No ratings yet
Sangam's Maths Project
12 pages
Introduction To Correlation and Regression Analysis
No ratings yet
Introduction To Correlation and Regression Analysis
14 pages
CHM121 01 Chapter 3 Evaluation of Analytical Data
No ratings yet
CHM121 01 Chapter 3 Evaluation of Analytical Data
62 pages
Analytical Notes-2
No ratings yet
Analytical Notes-2
22 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
MTHS101
No ratings yet
MTHS101
4 pages
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
No ratings yet
Final Project: Raiha, Maheen, Fabiha Mahnoor, Zara
14 pages
Corr PDF
No ratings yet
Corr PDF
30 pages
Unit 3-1
No ratings yet
Unit 3-1
19 pages
Impact of Green Entrepreneurship On Sustainable Development
No ratings yet
Impact of Green Entrepreneurship On Sustainable Development
11 pages
DISC 203-Probability & Statistics-Muhammad Asim
No ratings yet
DISC 203-Probability & Statistics-Muhammad Asim
4 pages
Math 213 - Engineering Data Analysis
No ratings yet
Math 213 - Engineering Data Analysis
11 pages
Linear Regression Analysis
No ratings yet
Linear Regression Analysis
7 pages
Verified PDF Download Introductory Statistics 9th Edition Neil Weiss Ebook and TestBank Bundle Fast Instant Download
No ratings yet
Verified PDF Download Introductory Statistics 9th Edition Neil Weiss Ebook and TestBank Bundle Fast Instant Download
408 pages
Linear Regression
No ratings yet
Linear Regression
64 pages
BS Set 4 Linear Regression
No ratings yet
BS Set 4 Linear Regression
20 pages
Data Science Beginners Guide
No ratings yet
Data Science Beginners Guide
8 pages
IoT (15CS81) Module 4 Machine Learning
No ratings yet
IoT (15CS81) Module 4 Machine Learning
66 pages
The Quality of Sustainability Reports and Corporate Financial Performance: Evidence From Brazilian Listed Companies
No ratings yet
The Quality of Sustainability Reports and Corporate Financial Performance: Evidence From Brazilian Listed Companies
9 pages
Daniel & Katz (2018) Spoiler Impact On Enjoyment
No ratings yet
Daniel & Katz (2018) Spoiler Impact On Enjoyment
15 pages
Ethics & AI Lab Manual
No ratings yet
Ethics & AI Lab Manual
31 pages
The Empirical Status of Gottfredson and Hirschfs General Theory of Crime: A Meta-Analysis
No ratings yet
The Empirical Status of Gottfredson and Hirschfs General Theory of Crime: A Meta-Analysis
35 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
Machine Learning Midterm Exam
No ratings yet
Machine Learning Midterm Exam
106 pages
Transforming Higher Education
No ratings yet
Transforming Higher Education
13 pages
Audit Regression Nestle
No ratings yet
Audit Regression Nestle
2 pages
Work Life Balance of Working Parents: A Study of IT Industry Saloni Pahuja
No ratings yet
Work Life Balance of Working Parents: A Study of IT Industry Saloni Pahuja
18 pages
Biggest Loser 2015 Article
No ratings yet
Biggest Loser 2015 Article
19 pages
Poverty and Crime in 19th Century Germany: Halvor Mehlum, Edward Miguel, Ragnar Torvik
No ratings yet
Poverty and Crime in 19th Century Germany: Halvor Mehlum, Edward Miguel, Ragnar Torvik
19 pages
4 Multiple Regression Models
No ratings yet
4 Multiple Regression Models
66 pages
h2 Mathematics Summarized Formulae
No ratings yet
h2 Mathematics Summarized Formulae
19 pages
An Introduction To Explainable AI With Shapley Values - SHAP Latest Documentation
No ratings yet
An Introduction To Explainable AI With Shapley Values - SHAP Latest Documentation
20 pages
Expt3.ipynb - JupyterLab
No ratings yet
Expt3.ipynb - JupyterLab
5 pages
Assignment FM
No ratings yet
Assignment FM
2 pages
Lighting Intensity's Impact on Student Eye Fatigue
No ratings yet
Lighting Intensity's Impact on Student Eye Fatigue
7 pages
Raptor CRM
No ratings yet
Raptor CRM
129 pages
CEO Tenure and The Cost of Equity Capital
No ratings yet
CEO Tenure and The Cost of Equity Capital
11 pages
BB Total Exam
No ratings yet
BB Total Exam
46 pages

Correlation & Regression Guide

Uploaded by

Correlation & Regression Guide

Uploaded by

Correlation

Error analysis and propagation

𝒛=𝒙±𝒚 ∆𝒛 = (∆𝒙)𝟐 +(∆𝒚)𝟐

■ Propagation through multiplication/division 𝒛 = 𝒙𝒚 𝒛 = 𝒙/𝒚

Measured Value Unit Uncertainty

Pressure x Compression y Calculate the Coefficient of Correlation

■ There are no analytical expressions to obtain the set of regression

Estimate Cs and kmax and compare the parameters using methods of

You might also like