Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
16 views26 pages

Decision Science & Data Analysis

this includes data analysis notes

Uploaded by

Shweta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views26 pages

Decision Science & Data Analysis

this includes data analysis notes

Uploaded by

Shweta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Descriptive Analytics: Overview of Data for Analytics

❖ Overview of Using Data

● Variable: A characteristic or a quantity of ● Data are the facts and figures collected, analysed
interest that can take on different values. and summarized for presentation and
interpretation.

● Dichotomous Variable
● Observation is a set of values corresponding to
● Discrete Variable a set of variables.
● Continuous Variable

e.g. variables are Symbol, Industry, Share Price and ● Random Variable is a quantity whose values
Volume are not known with certainty.

❖ Overview of Data

Types of Data Sources of Data

● Population and Sample Data ● Statistical Studies


● Cross-sectional and Time Series ● Experimental study
data ● Observational/Non- ●
● Quantitative and Categorical Experimental Study
Data
The cost of data acquisition and the subsequent statistical analysis should not
exceed the savings generated by using the information to make a better decision.
UNIT-1 I introduction
to Statistics

Q.1 what is Decision Science?


Answer: Decision Science is the collection of quantitative techniques used to
inform decision-making at the individual and population levels.
● It includes decision analysis, risk analysis, cost-benefit and cost- e
effectiveness analysis, constrained optimization, simulation modelling,
and behavioral decision theory, as well as parts of operations research, m
microeconomics, statistical inference, management control, cognitive and
s social psychology, and computer science.

1
● By focusing on decisions as the unit of analysis, decision science provides
a unique framework for understanding public health problems, and for I
improving policies to address those problems.

Q.2 how is decision science different from other research approaches?


Answer: I. While most fields of research focus on producing new knowledge,
decision science is uniquely concerned with making optimal choices based on
available information.
I. Decision science seeks to make plain the scientific issues and value judgments
underlying these decisions, and to identify trade-offs that might accompany any
p articular action or inaction.

Q.3 what kinds of tools and methods do decision scientists use?


Answer: Decision science utilizes a variety of tools which include models for
decision-making under conditions of uncertainty, experimental and descriptive
studies of decision-making behavior, economic analysis of competitive and
strategic decisions, approaches for facilitating decision-making by groups, and
m athematical modeling techniques.

Q.4 Where is decision science used?


Answer: Decision science has been used in business and management, law and
education, environmental regulation, military science, public health and public
p olicy.
❖ Data Science vs Decision Science
● Data Scientists and Decision Scientists role differ in terms of thinking
about data.
● Data Scientists : Data is the Tool for improving and developing new
products based on Robust Statistical Methods
● Decision Scientists : Data is the Tool to Make Decisions

Data Scientists Decision Scientists

2
Data Scientists are looking to Decision Scientists frame data analysis
understand, interpret and analyze with in terms of the decision making process.
the goal of building better products.
Therefore, data quality, statistical
rigor and measurement perfection are
often their trademarks.

For Data Scientists, the analysis, They are looking at the various ways of
statistical rigor and understanding analyzing data as it relates to a specific
comes first. Business challenges come business question posed by their
second. stakeholder/s.

Data Scientists think about data in Other names for this role may include:
terms of data patterns, data analytics, analyst and applied
processing, algorithms and statistics. analytics.
Often, data scientists areconducting
deep analysis, and experimental
statistics.

They are obsessed with finding causal The Data Scientist focuses on a finding
relationships. insights and relationships via statistics.
The Decision Scientist is looking to find
insights as they relate to the decision at-
hand. Example decisions might include:
Age groups to focus on, most optimal
way to spend a yearly budget etc.

They are deeply focused on data For Decision Scientists, the business
quality as it relates to their product problem comes first. Analysis follows
area because better data quality results and is dependent on the question or
in more thorough statisticalanalysis. business decision that needs to be made.

3
They frame data analysis in terms of The Decision Scientist need to consider
algorithms, machine learning, the type of analysis, visualization
statistics and experimentation. methods and behavioral understanding
that can help a stakeholder make a
specific decision. Decision Scientists
need to make insights useable.

They are looking to bring order to big They need to be able to work with a
data to find insights and learnings as it variety of data sources and inputs —
relates to their product/focus area. each selected based on its ability to help
answer the business question.

Their north star goal: use high-quality Their north star goal: use data and
data and robust statistics to support statistics to support business decision
product development. making, budgeting and marketing spend.

This means a Decision Scientist needs to have a strong business acumen


as well as a robust analytical mind. You cannot have one without the
other in a Decision Science role.
S tatistics
● Every minute of the working day, decisions are made by businesses around t
he world that determine whether companies will be profitable and growing
or whether they will stagnate and die.
● Most of these decisions are made with the assistance of information gathered
about the marketplace, the economic and financial environment, the w
orkforce, the competition, and other factors.
● Such information usually comes in the form of data or is accompanied by
data.
● Business statistics provides the tool through which s uch data are collected,
analyzed, summarized, and presented to facilitate the decision-making
process, and business statistics plays an important role in the ongoing saga
of decision making within the dynamic world of business.

4
Virtually every area of business uses statistics in decision making. Here are some
recent examples:
A Deloitte Retail “Green” survey of 1080 adults revealed that 54% agreed that
plastic, non-compostable shopping bags should be banned.
A survey of 1007 adults by RBC Capital Markets showed that 37% of adults
would b e willing to drive 5 to 10 miles to save 20 cents on a gallon of gas.

Statistics is the science concerned with developing and studying methods for
collecting, analyzing, interpreting and presenting empirical data. Statistics
is a highly interdisciplinary field ; research in statistics finds applicability in
virtually all scientific fields and research questions in the various scientific
fields motivate the development of new statistical methods and theory. I n
developing methods and studying the theory that underlies the methods s
tatisticians draw on a variety of mathematical and computational tools.

T wo fundamental ideas in the field of statistics are uncertainty and variation.


There are many situations that we encounter in science (or more generally in life)
in which the outcome is uncertain.
In some cases the uncertainty is because the outcome in question is not determined
y et (e.g., we may not know whether it will rain tomorrow)
While in other cases the uncertainty is because although the outcome has been
determined already we are not aware of it (e.g., we may not know whether we
passed a particular exam).

Real-life examples of statistics is:


Suppose you need to find how many members are employed in a city. Since the
city is populated with 15 lakh people, hence we will take a survey here for 1000
people (sample). Based on that, we will create the data, which is the statistic.

❖ Types of Statistics
Basically, there are two types of statistics.

5
• Descriptive Statistics
• Inferential Statistics
In the case of descriptive statistics, the data or collection of data is described in
summary.
But in the case of inferential stats, it is used to explain the descriptive one.
Both these types have been used on large scale.

Descriptive Statistics Inferential Statistics

The data is summarised and explained We attempt to interpret the meaning of


in descriptive statistics. The descriptive statistics using inferential
summarization is done from a statistics. We utilise inferential statistics
population sample utilising several to convey the meaning of the collected
factors such as mean and standard data after it has been collected,
deviation. evaluated, and summarised.

Descriptive statistics is a way of The probability principle is used in


organising, representing, and inferential statistics to determine if
explaining a set of data using charts, patterns found in a study sample may be
graphs, and summary measures. extrapolated to the wider population
from which the sample was drawn.

Histograms, pie charts, bars, and Inferential statistics are used to test
scatter plots are common ways hypotheses and study correlations

to summarise data and present it in between variables, and they can also be
tables or graphs. used topredict population sizes.

Descriptive statistics are just that: Inferential statistics are used to derive
descriptive. They don’t need to be conclusions and inferences from
normalised beyond the data they samples, i.e. to create accurate
collect. generalisations.

6
Probability and Statistics form the basis of Data Science. The probability
theory is very much helpful for making the prediction. Estimates and predictions
form an important part of Data science. With the help of statistical methods, we
make estimates for the further analysis. Thus, statistical methods are largely
dependent on the theory of probability. And all of probability and statistics is
dependent on Data.

❖ Data
● Data is the collected information(observations) we have about something or
facts and statistics collected together for reference or analysis.
● Data — a collection of facts (numbers, words, measurements, observations,
etc) that has been translated into a form that computers can process

Q. Why does Data Matter?


● Helps in understanding more about the data by identifying relationships that
may exist between 2 variables.
● Helps in predicting the future or forecast based on the previous trend of data.
● Helps in determining patterns that may exist between data.
● Helps in detecting fraud by uncovering anomalies in the data.

Data matters a lot nowadays as we can infer important information from it. Now
let’s delve into how data is categorized.
Data can be of 2 types categorical and numerical data.
For Example in a bank, we have regions, occupation class, gender which follow
categorical data as the data is within a fixed certain value and balance, credit score,
age, tenure months follow numerical continuous distribution as data can follow an
unlimited range of values.

The root of statistics is driven by variables. A variable is a data set that can be
counted that marks a characteristic or attribute of an item. For example, a car can
have variables such as make, model, year, mileage, color, or condition. By

7
combining the variables across a set of data (i.e. the colors of all cars in a given
parking lot), statistics allows us to better understand trends and outcomes.

❖ There are two main types of variables.


● First, qualitative variables are specific attributes that are often non- numeric.
Many of the examples given in the car example are qualitative. Other
examples of qualitative variables in statistics are gender, eye color, or city of
birth. Qualitative data is most often used to determine what percentage of an
outcome occurs for any given qualitative variable, and it often does not rely
on numbers. For example, trying to determine what percentage of women
own a business analyzes qualitative data.
● The second type of variable in statistics is quantitative variables.
Quantitative variables are studied numerically and only have weight when
about a non-numerical descriptor. In the car example above, the mileage
driven is a quantitative variable. However, the number 60,000 holds no value
unless it is understood that is the total number of miles driven.
● Quantitative variables can be further broken into two categories.
● First, discrete variables have limitations in statistics and infer that there are
gaps between potential discrete variable values. The number of points scored
in a football game is a discrete variable because (1) there can be no decimals
and (2) it is impossible for a team to score only 1 point.
● Second, statistics also makes use of continuous quantitative variables. These
values run along a scale - whereas discrete values have limitations,
continuous variables are often measured into decimals. When measuring the
height of the football players, any value (within possible limits) can be
obtained, and the heights can be measured down to 1/16ths of an inch if not
further.
Measures of Central Tendency
Q .1 What is meant by a measure of central tendency?
Answer: An average is frequently referred to as a measure of central tendency or
central value. This is a single value which is considered the most representative or
t ypical value for a given set of data. It is the value around which data in the set

8
tend to cluster. F or example: The average starting salary for social workers is
$15,000 per Year and it gives some idea of how much variety or heterogeneity
there is in the distribution )

Q.2 What are the objectives of averaging?


A nswer: T o get one single value that describes the characteristics of the
entire data. Measures of central value, by condensingthe mass of data in one
single value, enable us to get an idea of the entire data. Thus one value can
represent t housands, lakhs and even millions of values. F or example: I t is
impossible to remember the individual incomes of millions of earning people of
Delhi and even if one could do it there is hardly of any use. But if the average
income is obtained, we get one single value that represents the entire population.
Such a figure would throw light on the standard of living of an average Delhiites.

Q .3 What are objectives of averaging?


Answer: To facilitate comparison. Measures of central value, by reducing the
mass of data in one single figure, enable comparisons to be made. Comparison can
be made either at a point of time or over a period of time. F or example : The
figure of a verage sales for December may be compared with the sales figures of
previous m onths or with the sales figure of another competitive firm.

Q .4 What should be the properties of a good average?


A nswer:
● I t should be easy to understand : Since statistical m ethods are designed to
s implify complexity, it is desirable that an average be such that can be
readily understood, its use isbound to be very limited.
● It should be simple to compute : Not only an average s hould be easy to u
nderstand but it also should be simple to compute so that it can be used w
idely.

9
● It should be based on all the observations: The average should depend
upon each and every observation so that if any of the observation is dropped
average itself is Altered.
● It should be rigidly defined: An average should be properly defined so
that it has one and only one interpretation. It should preferably be defined
by an algebraic formula so that if different people compute the average from
the same figures they all get the same answer (Barring arithmetical
mistakes).
● It should be capable of further algebraic treatment: We should prefer to
have an average that could be used for further statistical computations. For
example: If we are given separately the figures of average income and
number of employees of two or more companies we should be able to
compute the combined average.
● It should have sampling stability : We should prefer to get a value which
has what the statisticians call ‘Sampling stability’. This means that if we
pick 10 different groups of college students, and compute the average of
each group, we should expect to get approximately the same values.
● It should not be unduly affected by the presence of extreme values:
Although each and every observation should influence the value of the
average, none of the observations should influence it unduly. If one or two
very small or very large observations unduly affect the average, i.e., either
increase its value or reduce its value, the average cannot be really typical of
the entire set of data.

Q. How would you select a specificmeasure of central tendency?


Answer: Selection of a measure of central tendency largely depends on the
nature of data.

Q.What are various measures of central tendency ?


Answer: I. Mean
● Arithmetic mean
● Geometric mean
● Harmonic mean

10
II. Median
III. Mode
Q.What are various types of averages or means?
● Arithmetic mean
● Geometric mean
● Harmonic mean

Q.What is arithmetic mean?


Answer: The arithmetic mean, often simply referred to as mean, isthe total of the
values of a set of observations divided by their total number of observations.

Ungrouped Data Grouped Data

The data that give information on each Grouped data: are presented in the form of
member of the population or sample a frequency distribution table/class
individually are called ungrouped data. Intervals with respective frequency.

Direct method: if X1, X2 , ...... XN represent Direct method: The formula for
the values of N items or observations, the estimating average from grouped data by
arithmetic mean denoted by (x̄ ) i s defined as: direct method is:

Where, X = mid-point of various classes


f= the frequency of each class
N= the total frequency

Short-cut method: Short-cut Method : When short-cut


method is used, the following formula is
applied.

It should be noted that any value can be taken


as arbitrary point and the answer would be the
same as obtained by the direct method.

11
Short-cut method takes more time as In case of grouped data, considerable
compared to direct method. However, this is saving in time is possible by adopting the
true only for ungrouped data. short-cut method.

Combined Mean: If we have the arithmetic mean and number of observations of


two or more than two related groups, we can compute combined average of these
groups by applying the following formula:

❖ Merits
● All values are used
● It has unique value & easy to calculate
● The sum of the deviations from the mean is zero.

❖ Demerits
❖ The mean is affected by extreme values

❖ Median
The median is a point in a distribution of scores above and below which exactly
half of the cases fall. This is a value which appears in the middle of ordered
sequence of values. This is also known as positional average. The term ‘position’
refers to the place of a value in a series.
Example: If the income of five persons is $7000, 7200,7500,7600,7800, then the
median income would be $7500.

12
Merits
● Median is unique
● Median is less affected by extreme values as compared to mean
● It can be used for open–end distribution
● Graphical presentation of median is possible
● Median is used for studying qualitative attributes

Demerits
● For median, it is necessary to arrange the data
● It is not capable for further algebraic treatment
● It does not use each and every observation of the data set

Q.What are positional measures ?


Answer: Positional measures are those that are estimated by dividing a series into
a equal number of parts. Important amongst these are quartiles, deciles and
percentiles.
Quartiles are those values of the variate which divide the total frequency into four
equal parts, deciles divide the total frequency in 10 equal parts and the
percentiles divide the total frequency in 100 equal parts.

Q. How are quartiles, deciles and percentiles computed?


The procedure for computing quartiles, deciles, etc., is the same as for median. For
grouped data, the following formulae are used for quartiles, deciles and
percentiles:

13
where the symbols have their usual meanings and interpretation.
Question: What is meant by Mode?
Answer: Mode refers to the most common value in a distribution or the largest
category of variable. It may also defined as the value which occurs the maximum
number of times, i.e. having the maximum frequency.
A distribution containing more than one mode is called bimodal or multimodal.

Question: How is mode calculated?


Answer: It involves fitting mathematically some appropriate type of frequency
curve to the grouped data and the determination of the value on the X-axis below
the peak of the curve. However, there are several elementary methods of
estimating the mode.
● Method for ungrouped
Tally method
● Method for grouped data

where,
L = Lower limit of the modal class
f1 = Frequency of the modal class
Fo = Frequency of the class preceding the modal class.
f2 = Frequency of the class succeeding the modal class.

14
Question: What are the merits of mode?
Answer:
● Like median, the mode is not affected by extreme values and its value can be
obtained in open-end distributions without ascertaining the class limits.
● Mode can be easily used to describe qualitative phenomenon. For example,
when we want to compare the consumer preferences for different types of
products, say, soap, toothpastes, are etc., of different media of advertising,
we should compare the modal preferences.
● In such distributions where there is an outstanding large frequency, mode
happens to be meaningful as an average.
Question: What are the limitations of mode?
Answer: Mode is not a rigidly defined measure as there are several formulae
for calculating the mode, all of which usually give somewhat different answers.
The value of mode cannot always be computed, such as ,in case of bimodal
distributions.

Question: What is the relationship among mean, median and mode?


Answer: A distribution in which the values of mean, median and mode coincide
(match) is known as symmetrical distribution.
Conversely stated, when the values of mean, median and mode are not equal, the
distribution is known as asymmetrical or skewed.
In moderately skewed or asymmetrical distributions, a very important relationship
exists among mean, median and mode. In such distributions, the distance between
the mean and the median is approximately one-third of the distance between the
mean and mode as will be clear from the following diagram:

Karl Pearson has expressed this approximate relationship as follows:


15
If we know any of the two values out of the three, we can compute the third from
these relationships.
M easure of Dispersion
Q.1 Why Measures of Dispersion is important?
Answer:
● It gives an additional information that enables us to judge the reliability of
our measure of the central tendency i.e. average.
● If data are widely dispersed then central location is less representative of the
data as a whole than it would be for data more closely centered around
mean.
● We may wish to compare dispersion of various samples. If a wide spread of
values away from the center is undesirable or presents unacceptable risk then
we should avoid choosing that distribution with greater variation.
● Measures of variation are of two types
a. Absolute Measure : are expressed in the same statistical units as
the original data is given.
b. Relative measures , also known as “coefficient” are pure
numbers, independent of the units of measurement. ● There are 4
types of measures of variation
a. Range : Range is the difference between the largest and the smallest values
in the data set.
Interquartile Range : Interquartile range is the difference between the 75th
percentile and 25th percentile.
IQR = Q3 – Q1
b. Quartile deviation
c. Mean deviation
d. Standard deviation
16
● S.D. is the most used measure of dispersion. The value of
● S.D. tells how closely the values of the data set are clustered around the
mean.
● It is defined as the square root of the mean of the squared deviations from
the arithmetic mean
● It is denoted by Greek letter σ (sigma)
● In general, a lower value of the standard deviation for a data set indicates
that the values of that data set are spread over a relatively smaller range
around the mean. In contrast, a larger value of the standard deviation for a
data set indicates that the values of that data set are spread over a relatively
larger range around the mean.
● Variance is the average amount that data values differ from the mean
● The square of standard deviation is called variance Variance = 2

● Sample Standard Deviation fora Frequency Distribution

● Computation Formula for Standard Deviation for a Frequency Distribution

where

Individual
Series

Discrete
Series

Contionus
Series

17
Population Variance

Variance for sample


data

Standard Deviation
(Population)

Standard Deviation
(Sample)

Coefficient of Variation
● Standard deviation was an absolute measure of Dispersion.
● C.V is relative measure of dispersion corresponding to standard deviation ●
C .V is used to compare the variability of two ormore data set.

● There are occasions, however, when this absolute measure of dispersion is


inadequate and a relative form becomes preferable.
● For example, if a comparison between the variability of distributions with
different variables is required, or
● when we need to compare the dispersion of distributions with the same v
ariable but with very different arithmetic means.
1. The values of the variance and the standard deviation are never n
egative. That is, the numerator in the formula for the variance should
never produce a negative value. Usually the values of the variance and
standard deviation are positive, but if a data set has no variation, then the
variance and standard deviation are both zero. For example, if four persons
in a group are the same age—say, 35 years—then the four values in the
data set are 35 35 35 35. If we calculate the variance and standard
18
deviation for these data, their values are zero. This is because there is no
variation in the values of this data set.
2. The measurement units of variance are always the square of the m
easurement units of the original data. This is so because the original
values are squared to calculate the variance. For Example, if the
measurement units of the original data are billions of dollars. However, the
measurement units of the variance are squared billions of dollars, which, of
course, does not make any sense. But the measurement units of the standard
deviation are the same as the measurement units of the original data
because the standard deviation is obtained by taking the square root of the
variance.

❖ Skewness
● Skewness means lack of symmetry or departure from symmetry.
● A symmetric distribution has its mean, median, mode equal and the
frequency curve is symmetrically situated about these values.
● When distribution has longer tail on right side, it is positively skewed.
● When longer tail is on left side, it is negatively skewed.

❖ Symmetric distribution
If there is no skewness or the distribution is symmetric like the bell-shaped
normal curve t hen the m ean = median = mode.

19
❖ Shape of a Distribution

❖ Karl Pearson coefficient of skewness

110
It is given by the formula:

❖ Kurtosis
The measure of kurtosis describes the degree of concentration of frequencies in a
given distribution. That is, whether the observed values are concentrated more
around the mode(a peaked curve) or away from the mode towards both tails. The
degree of kurtosis of a distribution is measured relative to the peakedness of a
normal curve.
There may be three possibilities;
(i) If a curve is more peaked than the normal curve, it is said to be Leptokurtic.
(ii) If a curve is less peaked than the normal curve, it is said to be Platykurtic.
(iii) If a curve is equally flat as normal curve, it is said to be Mesokurtic.

Measure of kurtosis
Kurtosis is measured by β2
.Formula for β2 is given by,

For a symmetrical(mesokurtic) curve β2 =


3. If β2 > 3, the curve is leptokurtic. If β2 <
3, the curve is platykurtic

❖ Weighted Average
Average calculated where some of the numbers are assigned more importance or
weight

20
Where w the weight of the data value x.
M easure of Association between two variables
C ORRELATION & REGRESSION ANALYSIS
Such a distribution in which each individual or unit of the set is made up of two
values is called a bivariate distribution.
The concept of ‘correlation’ is a statistical tool which studies the relationship
between two variables and Correlation Analysis involves various methods and t
echniques used for studying and measuring the extent of the relationship between
the two variables.
“ Two variables are said to be in correlation if the change in one of the variables
r esults in a change in the other variable”.

❖ C ORRELATION
● When the relationship is of quantitative nature, the appropriate statistical
tool for discovering and measuring the relationship and expressing it in a b
rief formula is known as correlation.
● T he measure of correlation called the coefficient of correlation indicates the
s trength & direction of relationship between two variables.
● The coefficient between two variables x and y is denoted by r or rxy or ρ.
● It lies between – 1 to + 1.
● If r = 0, then the variables are said to be independent.

❖ TYPES OF CORRELATION
1. Based on Direction: --
● Positive Correlation : When increase/decrease in the value of one variable
results in a corresponding increase/ decrease in the value of other variable.

21
● N egative Correlation: When increase/ decrease in the value of one variable
results in a corresponding decrease/ increase in the value of other variable. 2
. B ased on Degree:--
● H igh
● Moderate
● Low
I f we wish to label the strength of the association, for absolute values of r, 0-0.19
is r egarded as very weak, 0.2-0.39 as weak, 0.40-0.59 as moderate, 0.6-0.79 as
strong and 0.8-1 as very strong correlation, but these are rather arbitrary limits,
and the context of the results should be considered.

POSITIVE CORRELATION NEGATIVE CORRELATION

Heights and weights; Volume and pressure of perfect gas;

Household income and expenditure Current and resistance [keeping the


voltage constant]

Price and supply of commodities Price and demand of goods.


The goal of a correlation analysis is to see whether two measurement variables co
vary, and to quantify the strength of the relationship between the variables,
whereas regression expresses the relationship in the form of an equation.
For example , in students taking a Maths and English test, we could use
correlation to determine whether students who are good at Maths tend to be good
at English as well, and regression to determine whether the marks in English can
be predicted for given marks in Maths.

❖ METHODS OF STUDYING CORRELATION


1. Scatter Diagram Method.
2. Karl Pearson’s Coefficient of Correlation.

22
3. S pearman’s Rank Correlation Coefficient.

1 . SCATTER DIAGRAM
● The simplest method for studying correlation in two variables is a special t
ype of dot chart called Scatter Diagram.
● In this method given data are plotted in the form of dots, for each pair of X a
nd Y.
● T he more the plotted points scatter over the chart, the lesser is the degree of
relationship between two variables.
● The more nearly the points come to the line, the higher the degree of r
elationship.
● If the points are very close to each other, a fairly good amount of correlation
can be expected between the two variables. On the other hand if they are
widely scattered a poor correlation can be expected between them.
A dvantages:
● It is readily comprehensive and enables us to form a rough idea of the nature
of relationship between the two variables x and y.
● It is not affected by extreme observations.
D isadvantages:
● It is not a suitable method if the number of observations is fairly large.
● I t is only a rough measure of correlation where the exact magnitude cannot
be known.

2 . KARL PEARSON COEFFICIENT OF CORRELATION


It describes the degree & direction of relationship between two variables X and Y.
I t is denoted by the symbol ‘r’.
The value of Pearson’s coefficient of correlation lies between -1 to +1.
If X and Y are independent variables then coefficient of correlation is zero.

❖ PEARSON FORMULA

23
C orrelation coefficient is denoted by r given by the formula:-

❖ Correlation vs. Regression


● A scatter diagram can be used to show the relationship between two v
ariables
● C orrelation analysis is used to measure strength of the association (linear
relationship) between two variables
● C orrelation is only concerned with strength of the relationship
● No causal effect is implied with correlation

24

You might also like