1
CHAPTER 3
BIVARIATE, MULTIVARIATE DATA &
DISTRIBUTIONS
Outline of the Subject
2
Chapter 1: Introduction-to-Applied-Probability-Statistics
Chapter 2: Numerical_Summary_Measure
Chapter 3: Bivariate_Multivariate_Data_Distribution
Chapter 4: Probability_Sampling_Distributions
Chapter 5: Estimation and Statistical Intervals
Chapter 6: Testing Statistical Hypotheses
Outline of Chapter 3
3
3.1. Scatter Plots
3.2. Correlation
3.3. Fitting a Line to Bivariate Data
3.4. Nonlinear Relationships
3.5. Joint Distributions
3.1 Scatter Plots
4
A multivariate data set consists of measurements or
observations on each of two or more variables.
variables, 𝑥 and 𝑦.
One important special case, bivariate data, involves only two
The analysis of this type of data
deals with causes and
relationships and the analysis
is done to find out the
relationship among the two
variables.
Example of bivariate data can
be temperature and ice cream
sales in summer season:
3.1 Scatter Plots
5
The most important picture based on bivariate numerical data is a scatter plot.
Each observation (pair of numbers) is represented by a point on a rectangle
coordinate system
Example 3.1:
Fig. 3.1 Constructing a scatterplot: (a) rectangular coordinate system for a scatterplot
of bivariate data; (b) the point corresponding to the observation (4.5, 15)
3.1 Scatter Plots
6
Example 3.2:
Given the arsenic removal data with the corresponding pH values
as follows:
Fig. 3.2: Arsenic data on x =pH and y = arsenic removed (%)
3.1 Scatter Plots
7
Example 3.2
Large values of arsenic removal tend to be associated with low pH, a negative
or in verse relationship.
Fig. 3.3 The two scatterplots : (a) the best scale for both axes;
(b) the full scale for both axes.
3.2 Correla
tion
Introductio
n
8
A scatter plot of bivariate numerical data gives a visual
impression of how strongly x values and y values are
related.
It is not enough to make precise statement and draw reliable
conclusion from data by observing the scatter plot!
=> Correlation:
A correlation coefficient is a quantitative assessment of the
strength of relationship between x and y values in a set of
(x, y) pairs.
3.2 Correla
tion
Introductio
n
9
Example 3.3:
Fig. 3.4. Scatterplots illustrating various types of relationships:
(a) positive relationship, linear pattern;
(b) negative relationship, linear pattern;
(c) no relationship or pattern;
(d) positive relationship, curved pattern
3.2 Correlation
Pearson’s sample correlation coefficient
10
Let (𝑥1, y1), (𝑥2, y2), ..., (𝑥n, yn) denote a sample of ( x , y)
pairs.
Consider the x deviations (𝑥1 x'), ..., (𝑥n x') and the y
− −
deviations (y1 − y'), ..., (yn − y')
to obtain products of deviations of the form (x − x' )(y−y' ).
Multiply each x deviation by the corresponding y deviation
3.2 Correlation
Pearson’s sample correlation coefficient
11
Example 3.4:
In region I, both x and y
(𝑥 − x#) and (𝑦 − 𝑦#)
exceed their mean values, so
(𝑥 − x#) (𝑦 − 𝑦#) is
are both positive numbers
positive.
Same for region III.
In each of the other two
regions (II and IV), one
deviation is positive and
(𝑥 − x#)(𝑦 − 𝑦#) is
the other is negative, so
Fig. 3.5. Subdividing a scatterplot according to
the signs of x- x! and y- 𝑦!
negative
(a) a positive relation
3.2 Correlation
Pearson’s sample correlation coefficient
12
Example 3.4:
x!
Fig. 3.5. Subdividing a scatterplot according to the signs of x-
(
b) a negative relation; (c) no strong relation; and y- 𝑦!
4.2 Correlation
Definition
13
Pearson’s sample correlation r is given by
𝑟 = ∑
𝑥𝑖–𝑥̅ 𝑦 𝑖 –
𝑦̅ =
∑𝑖 𝑥𝑖–𝑥̅ ∑𝑖 y 𝑖 – 𝑆
2
(1)
𝑥𝑦
y̅ 2 𝑆𝑥𝑥
∑𝑥𝑖
where
2
𝑆𝑥𝑥 =
𝑛
∑𝑥2 − ∑𝑦𝑖
�
2
𝑛
𝑆𝑦𝑦 =
∑𝑦2 −
�
𝑆𝑥𝑦
= 𝑦 − ∑𝑥𝑖 ∑𝑦𝑖
𝑛
∑𝑥𝑖 𝑖
3.2Correlation
Definition
14
Pearson’s sample correlation r:
Example 3.5: Given the following data of the pollutant
removal percentage (y in %) from the filter water (x ,1000s of
liters)
Q: How to compute Pearson’s sample correlation r?
3.2Correlation
Definition
15
Pearson’s sample correlation r:
Example 3.5
∑𝑥𝑖
𝑆𝑥𝑥 = ∑𝑥𝑖 2
2
𝑛
−
2
∑𝑖 ∑
�
� 𝑥 = ∑𝑥
𝑖 𝑦
�𝑦𝑖
4.2 Correlation
Definition
16
Pearson’s sample correlation r:
Example 3.5
𝑆𝑥𝑥 = ∑𝑥𝑖 2
∑𝑥2 − 𝑛
∑𝑦𝑖
�
2
𝑆𝑦𝑦 = 𝑛
∑𝑦2 −
�
𝑆𝑥
= 𝑦 − ∑𝑥𝑖 ∑𝑦𝑖
𝑦 𝑛
∑𝑥𝑖 𝑖
3.2Correlation
Properties and interpretation of r
17
The value of r does not depend on the unit of measurement for
either variable.
The value of r is between −1 and +1
A value near the upper limit, +1, shows a positive relationship, vs. r close
to the lower limit, -1, suggests a negative relationship.
r = 1 only when all the points in a scatter plot of the data lie exactly on
a straight line that slopes upward.
Similarly, r = −1 only when all the points lie exactly on a downward-sloping
line.
The value of r is a measure of the extent to which x and y are linearly
related. A value of r close to zero does not rule out any strong
relationship between x and y; or not linear.
3.2Correlation
Properties and interpretation of r
18
Example 3.6: The article “Quantitative Estimation of Clay Mineralogy in
FineGrained Soils” (J. Geotech. Geoenviron. Engr., 2011: 997–1008)
reported on various chemical properties of natural and artificial soils.
Consider the accompanying data on the cation exchange capacity (CEC,
in meq/100 g) and specific surface area (SSA, in m2/g) of 20 natural
soils.
Correlation of SSA and CEC r = 0.853
There is evidence of a moderate to strong positive relationship.
4.2 Correlation
Properties and interpretation of r
19
Example 3.6: Scatterplot of the data
3.3Fitting a Line to Bivariate
Data Introduction
Given two numerical variables 𝑥 and 𝑦, one would need to use
20
information about 𝑥 to draw some type of conclusion concerning 𝑦.
Often an investigator wants to predict the y value that would result from
making a single observation at a specified x value—for example, to predict
product sales y for a sales region in which advertising expenditure x is one
The different roles played by the two variables 𝑥 and 𝑦 are reflected
million dollars.
as:
𝑦 is called the dependent or response variable.
𝑥 is referred as the independent, predictor or explanatory variable.
If a scatter plot of 𝑦 versus 𝑥 exhibits a linear pattern, it is natural to
summarize the relationship between the variables by finding a line
that is as close as possible to the points in the plot.
3.3Fitting a Line to Bivariate
Data Introduction
21
Example 3.7
Suppose a car dealership advertises that a particular type of vehicle can be
rented on a one-day basis for a flat fee of $25 plus an additional $.30 per
mile driven. If such a vehicle is rented and driven for 100 miles, the dealer’s
revenue y is :
y= 25 + 0.30x100 = 25+30=55
More generally, if x denotes the distance driven in miles, then
y= 25 + 0.3x
That is, x and y are linearly related.
The general form of a linear relationship between x and y is
y = a + bx,
where
b is the slope of the line
a is the vertical intercept, which is the height of the line above the value
x = 0.
3.3Fitting a Line to Bivariate
Data Fitting a straight line
22
Definition
line 𝑦 = 𝑎 + 𝑏𝑥 to bivariate data (𝑥1, 𝑦1),..., (𝑥𝑛, 𝑦𝑛) is the
The most widely used criterion for assessing the goodness of fit of a
sum of the squared deviation about the line:
.𝑦𝑖 2
= 𝑦1 𝑎+ 2
+ ⋯ + 𝑦𝑛 −𝑎 + 𝑏𝑥𝑛 2
𝑖= − 𝑎+ − 𝑏𝑥1
1 𝑏𝑥𝑖
According to the principle of least squares, the line that gives the
best fit to the data is the one that minimizes this sum; it is called the
least squares line or sample regression line.
3.3Fitting a Line to Bivariate
Data Fitting a straight line
23
Find a and b
3.3Fitting a Line to Bivariate
Data Fitting a straight line
Slope 𝑏 and vertical intercept 𝑎 of the line
24
The slope of the least squares line is given
𝑆𝑥𝑦
by:
∑𝑥𝑖𝑦𝑖 − (∑𝑥𝑖) =
(∑𝑦
𝑏 = 𝑖)/𝑛
∑𝑥2� − ∑𝑥 2/ 𝑆𝑥𝑥
𝑖 𝑛
The vertical intercept 𝑎 of the least squares line is
𝑎 = � − 𝑏𝑥̅
#
�
The equation of the least squares line is often written as 𝑦4 = 𝑎 + 𝑏𝑥,
where 𝑦4 is a prediction of y that results from the substitution of any
particular x value into the equation.
3.3Fitting a Line to Bivariate
Data Fitting a straight line
25
Example 3.7
The cetane number is a critical property in specifying the ignition
quality of a fuel used in a diesel engine. Determining this number
for a biodiesel fuel is expensive and time consuming. The article
“Relating the Cetane Number of Biodiesel Fuels to Their Fatty Acid
Composition: A Critical Study” (J. of Automobile Engr., 2009: 565–
583) included the following data on x = iodine value (g) and
y=cetane number for a sample of 14 biofuels. The iodine value is
the amount of iodine necessary to saturate a sample of 100 g of oil.
3.3Fitting a Line to Bivariate
Data Fitting a straight line
26
Example 3.7: Let compute a and b for the linear regression: y = a + bx
3.3Fitting a Line to Bivariate
Data Fitting a straight line
27
Example 4.7: The scatterplot with least squares line superimposed
3.3Fitting a Line to Bivariate
Data Fitting a straight line
28
Assessing the fit of the least squares line: Definitions
A quantitative assessment is based on the vertical deviations from
The height of the least squares line above 𝑥1 is 𝑦51 = 𝑎 + 𝑏𝑥1 and 𝑦1
the least squares line.
is
the height of the corresponding point in the scatterplot, so the vertical
y1 − 𝑎 + = 𝑦1 − 𝑦41
deviation (residual) from this point to the line is
𝑏𝑥1 if the corresponding point in
A residual (𝑦𝑖 − 𝑦4 𝑖 ) is positive
the scatterplot lies above the least squares line and negative if the
point lies below the line.
It is good to have this point is close to the least squares lines or the
residual is close to 0!!
A natural measure of variation about the least squares line is the sum
of the squared residuals
3.3 Fitting a Line to Bivariate Data
Fitting a straight line
29
Residual sum of squares (error sum of squares ), denoted by 𝑆𝑆𝐸, is
Assessing the fit of the least squares line: Definitions
𝑛
given by
𝑆𝑆𝐸 = .(𝑦𝑖 − 𝑦8𝑖 )2
𝑖=1
where 𝑦8𝑖 = 𝑎 +
𝑏𝑥𝑖.
SSE is a measure of “unexplained” variation; it is the amount of
variation in y that cannot be attributed to the linear relationship
between x and y.
The more points in the scatterplot deviate from the least squares line,
the larger the value of SSE and the greater the amount of y variation
that cannot be explained by a linear relation.
3.3 Fitting a Line to Bivariate Data
Fitting a straight line
30
Assessing the fit of the least squares line: Definitions
𝑛
Total sum of squares, denoted by SST, is defined as
𝑆𝑆𝑇 = .(𝑦𝑖 − 𝑦#) 2
𝑖=1
SST, the greater the amount of variability in the observed 𝑦𝑖’s.
SST is interpreted as a measure of total variation; the larger the value of
The coefficient of determination, denoted by 𝒓 , is given by
𝟐
𝑆𝑆𝐸
𝑟 =1−
2
𝑆𝑆𝑇
It is the proportion of variation in the observed y values that can be
Multiplying 𝑟 by 100 gives the percentage of y variation attributable to
explained by a linear relationship between x and y in the sample.
2
the approximate linear relationship. The closer this percentage is to
100%, the more successful is the relationship in explaining variation in y.
3.3 Fitting a Line to Bivariate Data
Fitting a straight line
31
Assessing the fit of the least squares line: Back to Example 3.7
One has:
3.4 Nonlinear Relationships
Power transformations
35
A scatter plot of bivariate data frequently shows curvature rather
than a linear pattern.
We need to find the way to fit the data!!
If a scatter plot is curved and monotonic-either strictly decreasing
transformation for 𝑥 and 𝑦 so that there is a linear pattern in a
or strictly increasing, it is possible to find a power
scatter plot of the transformed data.
such that 𝑥- = 𝑥𝑝 and 𝑦- = 𝑦𝑞 to transform the nonlinear
By using power transformation, we mean the use of power p andq
relationship between (x,y) to the linear (𝑥′,𝑦′) pairs, i.e. y′ = a +
bx!
3.4 Nonlinear Relationships
Power transformations
36
The table shows
a “ladder” of the
most frequently
used
transformations
To find an
appropriate
transformation from
which power
transformation, we
can use Tukley’s
bulging rule!!
3.4 Nonlinear Relationships
Power transformations
37
Tukley and Mosteller’s bulging rule:
The direction of the bulge
indicates the direction of the
power transformation of Y
and/or X to straighten the
relationship between them.
3.4Nonlinear
Relationships Power
transformations
38
on x that is up the ladder from the no-transformation row, i.e. 𝑥′ = 𝑥2 or 𝑥3 and
Example: For segment 2, to straighten the plot, we should use a transformation
y is down the ladder, i.e. 𝑦′ = 1/𝑦 or ln(y).
Once a straightening transformation has
been identified, a straight line can be fit
to the (x’, y’) points using least squares
For example, if x’= x2 and y’ = y,
.
the least squares line gives:
y= a + x2
a+
from which
2
y= b
x
3.4Nonlinear
Relationships Power
transformations
39
Example 3.12: To make crispy chips, it is important to find characteristics of
the production process that produce chips with an appealing texture. Given the
following data on x =frying time (sec) and y= moisture content (%):
Check with the Tukley rule!
3.4Nonlinear
Relationships Power
transformations
40
Example 3.12: The scatterplot has the pattern of segment 3, so we must
go down the ladder for x or y
3.4Nonlinear
Relationships Fitting a
polynomial function
41
a quadratic function 𝑎 + 𝑏1𝑥 + 𝑏2𝑥2 can be used in fitting the
If the general pattern of curvature in a scatter plot is not monotonic,
The least squares coefficients 𝒂, 𝒃𝟏, and 𝒃𝟐 are the values of
data.
𝑎@, 𝑏A1 and 𝑏A2 that minimizes the following:
𝑔(𝑎@, 𝑏A1 , 𝑏A 2 ) = + 𝑏A 1 𝑥 𝑖 + 𝑏
� A 𝑥 2))2
2
.(𝑦𝑖 − (𝑎@
𝑖=1
There are popular statistical computer packages that can
numerically find the coefficient values.
3.4Nonlinear
Relationships Smoothing
a scatter plot
42
Sometimes the pattern in a scatter plot is too complex for a line or curve
of a particular type (e.g., exponential of parabolic) to give a good fit.
Statisticians have recently developed some more flexible methods
that permit a wide variety of patterns.
One such method is LOWESS (or LOESS), short for locally
Let (𝑥∗, 𝑦 ∗) denote a particular one of the 𝑛 (𝑥,𝑦) pairs in the sample.
weighted scatter plot smoother:
The �value corresponding to (𝑥∗, 𝑦 ∗) is obtained by fitting a straight line
5
�
using only a specific percentage of the data (e.g., 25%) whose 𝑥 values are
closest to 𝑥 ∗.
𝑥 values closer to 𝑥∗ are more heavily weighted than those are farther away.
This process is repeated for each of the 𝑛 points, so n different lines are fit
Finally, the fitted points are connected to produce a LOWESS curve.
3.4Nonlinear
Relationships Smoothing
a scatter plot
43
Example 3.9:
Weighing large deceased animals found in wilderness areas is usually
not feasible, so it is desirable to have a method for estimating weight
from various characteristics of an animal that can be easily determined.
A data set consisting of various characteristics for a sample of n =143
wild bears is given.
Figure 3.9(a) shows a scatterplot of y=weight vs. x =distance around the
chest (chest girth). At first glance, it looks as though a single line
obtained from ordinary least squares would effectively summarize the
pattern.
Figure 3.9(b) shows the LOWESS curve produced by Minitab using a span
of 50% (the fit at (x , y ) is determined by the closest 50% of the sample).
The curve appears to consist of two straightline segments joined together
above approximately x =38. The steeper line is to the right of 38,
indicating that weight tends to increase more rapidly as girth does for
girths exceeding 38 in.
3.4Nonlinear
Relationships Smoothing
a scatter plot
44
Example 3.9:
A scatterplot (a) and LOWESS curve (b) for the bear weight data
of Example 3.9
3.4Nonlinear Relationships
Fitting a linear function for k predictors
Consider fitting a relation of the form: 𝑦 = 𝑎 + 𝑏1𝑥1 + ⋯
45
+ 𝑏𝑘𝑥𝑘, where 𝑥𝑘 is the k-th predictor.
The least squares coefficients 𝑎, 𝑏1, … , 𝑏𝑘 are the 𝑎@,
𝑏A 1 , … , 𝑏A𝑘 that minimize
values of
𝑔(𝑎@, 𝑏A 1 , … , (𝑦j − + 𝑏A 1 𝑥1j + … + 𝑏A𝑘 𝑥𝑘j))2
𝑏A 𝑘 ) = . (𝑎@
j=1
To find the coefficients:
Take the partial derivative of g(.) with respect to each unknown, equate
these to zero to obtain a system of k +1 linear equations in the k+1
unknowns (the normal equations), and solve the system.
3.4Nonlinear
Relationships Fitting a
linear function
46
To minimize the least squares function L 𝑎@, 𝑏A1 , … , 𝑏A𝑘
w.r.t
𝑛 𝑘
𝐿 = .(𝑦j − . 𝑏A 𝑖 𝑥𝑖j )2
− 𝑎@ 𝑖=1
j=1
𝜕𝐿 F
To find the coefficients, we solve the (n+1) equations:
𝑛 𝑘
𝜕𝑎@ ˜ ˜ = −2 .(𝑦 − . 𝑏A 𝑥𝑖j ) = 0
𝑖
− 𝑎@
j
˜
𝑎,𝑏1,..𝑏𝑘
j= 𝑖=1
𝜕𝐿 𝑛
1
𝑘
A ˜ ˜F ˜
= −2 . 𝑦j − 𝑎@ − . 𝑥𝑖j = 0
𝜕𝑏𝑖 𝑎,𝑏1,..
j=1
𝑏𝑘
estimators of the regression coefficients 𝑎I, 𝑏A 1 , .J. 𝑏𝑘.
The solution to the normal equations will be the least squares
3.4 Nonlinear Relationships
Fitting a linear function
47
Example 3.10: Soil and
sediment adsorption is an
important characteristic because
it influences the effectiveness of
pesticides and various
agricultural chemicals.
Given the following data on
𝑥1=amount of extractable
y=phosphate adsorption index,
iron, and 𝑥2 =amount of
extractable aluminum
3.4 Nonlinear Relationships
Fitting a linear function
48
Example 3.10:
request to fit a + b1𝑥1 + 𝑏2𝑥2 to the
Figure shows the software output from a
phosphate adsorption data using the
principle of least squares.
y' = −7.35 + 0.113 x1 + 0.349 x2
The result is:
The coefficient of multiple
determination:
where
3.5 Joint Distributions
Distributions for two discrete variables
49
If 𝑥 and 𝑦 are both discrete, their joint distribution is specified by a
joint mass function 𝑓(𝑥,𝑦) satisfying:
1. 𝑓 𝑥, ≥ 0
2. ∑𝑎𝑙𝑙(𝑥,𝑦) 𝑓
𝑥, =1
Often, there is no nice formula for 𝑓(𝑥,𝑦).
When there are only a few possible values of 𝑥 and 𝑦, the mass
function is most conveniently displayed in a rectangular table.
3.5 Joint Distributions
Distributions for two discrete variables
50
Example 3.11: A certain market has both an express checkout
register and a super-express register.
Let x denote the number of customers queueing at the express
register at a particular weekday time, and let y denote the number of
customers in line at the super-express register at that same time.
The joint mass function is as given as:
According to the table,
f(x, y) > 0 for only 17
(x, y) pairs
3.5Joint Distributions
Distributions for two discrete variables
51
Example 3.11: The (x, y) pairs for which the number of customers at the
express register is equal to the number of customers at the other register are
(0, 0), (1, 1), (2, 2), and (3, 3), so
The total number of customers at these two
registers will be 2 if (x, y) = (2, 0), (1, 1), or (0, 2):
3.5Joint Distributions
Distributions for two continuous variables
52
If 𝑥 and 𝑦 are both continuous, their joint distribution is specified by a joint
mass function 𝑓(𝑥,𝑦) satisfying:
1. 𝑓 𝑥,
' 𝑦 ≥ 0
2. ∫' 𝑑𝑥𝑑𝑦 = 1
∫ 𝑥,
&'&'
The graph of f (x, y) is a surface in three-dimensional space.
The second condition indicates that the total volume under this
density surface is 1.
Figure 3.12:
Volume representing the
proportion of ( x, y) in the region
A
4.5 Joint Distributions
Correlation and the Bivariate Normal Distribution
The covariance bet wee n 𝑥 and 𝑦 is defined by:
53
> >
𝜎𝑥𝑦 = O(𝑥 − 𝜇𝑥)(𝑦 − 𝑑𝑥𝑑𝑦
𝜇𝑦)𝑓
O => 𝑥,
=>
where 𝜇𝑥 and 𝜇𝑦 denote the mean values of x and y, respectively.
The correlation coefficient 𝜌 is defined by
𝜎𝑥𝑦
𝜌 =𝜎𝑥𝜎𝑦
𝜌 does not depend on the 𝑥 or 𝑦 units of measurement.
−1 ≤ 𝜌 ≤ 1
The closer 𝜌 is to +1 or -1, the stronger the linear
relationship between the two variables.
4.5 Joint Distributions
The Bivariate Normal Distribution
54
The bivariate normal joint density function is given by
𝑓 𝑥−𝜇 2
−2
𝑥−𝜇 𝑦−𝜇 𝑦−𝜇𝑦 2
]
2[ +
𝑥, =
1
1 −2 𝑥 𝑥 𝑦
σ𝑥 𝜌
σ𝑦
1−𝜌
𝑒
σ𝑥 σ𝑦
(1)
2𝜋σ𝑥σ𝑦
where −∞ < 𝑥 < ∞ and −∞ < 𝑦 < ∞
When 𝑥 and 𝑦 are statistically independently, the joint density function
𝑓(𝑥,𝑦) must satisfy
f 𝑥, = 𝑥 𝑓2 𝑦
𝑦 𝑓1
where 𝑓1 𝑥 and denote the marginal distributions of 𝑥 and 𝑦,
respectively. 𝑓2
distributions for 𝑥 and 𝑦 separately and then use (1) to yield the joint
Note that once independence is assumed, one has only to select appropriate
distribution.
4.5 Joint Distributions
The Bivariate Normal Distribution
55
Example 4.12