Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views21 pages

Unit 4 Correlation Analysis

Correlation analysis is a statistical method that measures the strength and direction of the linear relationship between two variables, with a correlation coefficient value ranging from +1 to -1. It can indicate positive, negative, or no correlation and is useful in data analytics for understanding relationships, predictive modeling, and feature selection. The Pearson correlation coefficient specifically measures the linear relationship between two continuous variables, providing insights into their association.

Uploaded by

theiconicps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

Unit 4 Correlation Analysis

Correlation analysis is a statistical method that measures the strength and direction of the linear relationship between two variables, with a correlation coefficient value ranging from +1 to -1. It can indicate positive, negative, or no correlation and is useful in data analytics for understanding relationships, predictive modeling, and feature selection. The Pearson correlation coefficient specifically measures the linear relationship between two continuous variables, providing insights into their association.

Uploaded by

theiconicps
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Correlation Analysis

Unit 4
Correlation Analysis
 Correlation analysis is a statistical method used to measure
the strength of the linear relationship between two
variables and compute their association.
 Correlation analysis calculates the level of change in one
variable due to the change in the other.
 A high correlation points to a strong relationship between
the two variables, while a low correlation means that the
variables are weakly related.
 Correlation is a bivariate analysis that measures the
strength of association between two variables and the
direction of the relationship. In terms of the strength of the
relationship, the correlation coefficient's value varies
between +1 and -1. A value of ± 1 indicates a perfect
degree of association between the two variables.
Correlation Analysis
As the correlation coefficient value goes towards
0, the relationship between the two variables will
be weaker. The coefficient sign indicates the
direction of the relationship; a + sign indicates a
positive relationship, and a - sign indicates a
negative relationship.
Types of Correlation
Types of correlation
Based on Based on the degree of correlation:

Correlation between two variables can be either a positive correlation, a


negative correlation, or no correlation. Let's look at examples of each of these
three types.

 Positive correlation: A positive correlation between two variables means


both the variables move in the same direction. An increase in one variable
leads to an increase in the other variable and vice versa.
 For example, spending more time on a treadmill burns more calories.
 Negative correlation: A negative correlation between two variables means
that the variables move in opposite directions. An increase in one variable
leads to a decrease in the other variable and vice versa.
 For example, increasing the speed of a vehicle decreases the time you take to
reach your destination.
 Weak/Zero correlation: No correlation exists when one variable does not
affect the other.
 For example, there is no correlation between the number of years of school a
person has attended and the letters in his/her name.
Example of correlation analysis
Types of correlation
Based on the change in proportion:
 Linear: If the value of the amount of change in one
variable tends to preserve a constant ratio to the amount of
change in other variables, then the correlation is said to be
linear. For Example, Whenever the price rises by 10%,
then supply rises by 20%.

 Non-linear: If the value of the amount of change in one


variable does not preserve a constant ratio to the amount
of change in the other variables, then the it is said to be a
Non-linear correlation. It is also known as the Curvilinear
correlation. For Example, Whenever price rises by 10%,
but supply rises sometimes by 20%, sometimes by 10%, and
sometimes by 40%.
Types of correlation
Based on the number of variables studied:
 Simple Correlation: When we consider only two variables(Bivariate
analysis) and check the correlation between only those variables, it is said
to be a Simple Correlation. For example, Price and demand, Height and
Weight, Income and consumption, etc.

 Multiple Correlation: When we consider more than three or three


variables for correlation simultaneously, it is termed Multiple
Correlation. For example, When we study the relationship between the
yield of rice per hectare and both the amount of rainfall along with the
number of fertilizers are used to find the relationship with rice production.

 Partial Correlation: When one or more variables are kept constant and
the relationship is studied between the remaining variables, then it is
termed Partial Corr. Study the relationship between 2 variables and
assuming other variables are constant. For example, Relationship
between rainfall and rice yields under constant temperature.
Uses of Correlation Analysis
Correlation analysis is useful in data analytics for various
purposes, including:
 Understanding relationships: Correlation helps to
determine how variables are related, which can provide
insights into cause-and-effect relationships or
dependencies.
 Predictive modeling: Correlation analysis can identify
variables that are strongly correlated with the target
variable, helping to build more accurate predictive models.
 Feature selection: Correlation can assist in selecting the
most relevant features or variables for a particular analysis
or model, by identifying variables that are highly correlated
with the target variable or with each other.
Uses of Correlation Analysis
Data exploration: Correlation analysis aids in
exploring and summarizing data by identifying
patterns and relationships between variables,
which can be helpful in uncovering trends or
anomalies.
Correlation Analysis
 Organizations collect data on several variables, sometimes
the number of variables can run into thousands (including
derived variables such as ratios and interactions).
 For example, mobile service providers collect data on
variables such as call duration, number of calls, numbers to
which the calls are made, number of calls received, the
device that was used to make the call, location (and mobile
tower that the phone was attached to), time between calls,
last recharge (in case of pre-paid mobile services), recharge
amount, service plan (in case of post-paid connection),
number of messages sent, number of messages received,
apps downloaded, time spent on surfing internet, and so on.
 The number of variables collected and new variables
generated may exceed several thousands.
Correlation Analysis
The idea behind collecting all these variables is to
find answer to questions such as
1. Which customer is likely to churn?
2. How to increase the revenue generated from a
customer?
3. What is the customer lifetime value?
4. What is the best service plan for a customer?
5. What recommendations can be made to a customer?
Correlation Analysis
 Finding answer to the aforementioned questions involves
building predictive/prescriptive analytics models. Model
building involves identifying the variables among
thousands of variables (in analytics terminology this is
called variable selection or feature selection) to build the
model.
 Taking all the variables simultaneously to create a model
can result in problems such as multi-collinearity, which can
destabilize the model and is also time consuming since
most predictive analytics model development involves
matrix operations such as matrix inverse calculation.
 So, the knowledge of how different variables are related to
one another is important in building analytical models.
PEARSON CORRELATION COEFFICIENT
 Pearson product moment correlation (in short Pearson
correlation) is used for measuring the strength and
direction of the linear relationship between two continuous
random variables X and Y. For example, consider two
variables − the average call duration (variable Y) and the
age (variable X).
 We may like to know whether the average call duration is
related to the age of the caller, that is, whether change in
age is related to change in average call duration.
 It is also possible that there may not be any relationship
between age and average call duration. A simple approach
for checking existence of association relationship is to draw
a scatter plot.
PEARSON CORRELATION COEFFICIENT
 In Figure 8.1, we can see that the average call duration (Y)
decreases as the age of the customer (X) increases.
 We can measure the strength of the linear association
relationship using a numerical measure called correlation
coefficient. In the next section, we will be discussing
mathematical equations for calculating Pearson product
moment correlation coefficient.
 Calculation of Pearson Product Moment Correlation Coefficient
 Pearson product moment correlation is used when we are
interested in finding linear relationship between two
continuous random variables
 Let Xi be different values of the variable X and Yi be different
values of Y. Then the standardized values of X and Y are given
by
PEARSON CORRELATION COEFFICIENT
The Pearson’s correlation coefficient is given
by:

when the standard deviation is calculated


from sample. For large samples, the
correlation coefficients calculated using Eqs.
PEARSON CORRELATION COEFFICIENT
We can note the following properties from Eq.
(8.3):
1. Whenever the value of Xi is greater than mean and if
the corresponding value of Yi is also greater than mean,
then the numerator in equation will be positive.
2. Whenever the value of Xi is lesser than mean and if the
corresponding value of Yi is also lesser than mean, then
the numerator in equation will be positive.
3. Whenever the value of Xi is lesser than mean (or
greater than mean) and the corresponding value of Yi is
greater than mean (or lesser than mean), then the
numerator in equation will be negative
PEARSON CORRELATION COEFFICIENT
Equation (8.3) is mathematically equivalent to Eqs.
(8.5), (8.6), and (8.7):

 where Cov(X,Y) is the covariance between random variables


X and Y and is given by
PEARSON CORRELATION COEFFICIENT

Properties of Pearson Correlation Coefficient


1. The value of correlation coefficient lies between −1 and
+1. High absolute value of r, |r|, indicates strong
relationship between the two variables.
2. Positive value of r indicates positive correlation (as
value of X increases, the value of Y also increases) and
negative value of r indicates negative correlation (as the
value of X increases, the value of Y decreases).
3. The sign of correlation coefficient is same as the sign of
covariance between the two random variables

You might also like