Factor Analysis
Presentation prepared by
Mark Morrison
2001
What is factor analysis?
z
A class of procedures used for data reduction
In marketing research, there are usually many
variables that are correlated
Factor analysis seeks to reduce this large number
of inter-related variables to a few underlying
factors
Uses of factor analysis
Two main uses:
z
to identify a set of underlying factors that
describe respondents attitudes towards a product
to identify a new smaller set of uncorrelated
variables for use in subsequent regression or
cluster analysis
Use 1: Identifying a set of
underlying factors
z
In questionnaires many attitudinal questions
often asked
Just because thirty questions are asked, doesnt
mean there are thirty product attributes!
Using factor analysis, the thirty variables can be
reduced to an underlying set of factors
Example: Choice of GP surgery
z
It is important to me to attend a GP surgery (1-strongly
agree, 5-strongly disagree):
1. Which is within walking distance of my home
2. Where I can park near the surgery
3. Where the waiting times are short
4. Where they will bulk bill
5. Which has very modern facilities and equipment
6. Where I know that the staff wont talk about my health with others
7. Which has a comfortable atmosphere
8. Where the staff are friendly
9. Where a number of doctors work
10. Which is a small practice with just one doctor
Example: Choice of GP surgery
z
Some of these variables are quite related:
z
z
z
Q9 and Q10 appear to relate to surgery size
Q1 and Q3 seem to relate to convenience
Q7 and Q8 seem to relate atmosphere
It is possible that we may be able to combine
these variables to produce a smaller set of
underlying factors
We can then use these factors to describe what is
important to people about attending a surgery
Use 2: identifying uncorrelated
variables for regression analysis
z
Suppose we wanted to estimate a regression
equation prediction whether a person would visit
a GP surgery
We might start by producing a questionnaire that
asks people to rate the likelihood that they would
visit a number of GP surgeries.
z
These surgeries would be described using the
characteristics from the previous example
Identifying uncorrelated variables
for regression analysis
z
Would it then be possible to regress peoples
ratings of the likelihood that they would visit
against the characteristics of the surgeries?
rating = bo + b1.walking distance + b2.parking +
b3.shortwait +b4.bulkbill+ b5.equip+ b6.wonttalk+
b7.comfortableatmosphere+b8.friendly
+b9.numbdoctors +b10.smallpractice
Identifying uncorrelated variables
for regression analysis
z
z
Not possible, because of multicollinearity
If a factor analysis was done first, could include
the factors as explanatory variables ie
rating = bo + b1.convenience + b2.atmosphere+
b3.size +b4.cost
Other examples of its use in
marketing research
z
Market segmentation - to identify the underlying
variables upon which to group consumers
Cluster analysis can be performed using the
factors
The Factor Analysis Model
In factor analysis, each of the factors is expressed
as a linear combination of the observed
variables, similar to regression analysis:
Fi = b1X1 + b2X2 + + bnXn
Fi is the estimate of the ith factor
bi is the factor weight or coefficient
Xi is a variable
The Factor Analysis Model
z
From our previous example, the factor score for
atmosphere might be calculated as follows:
Atmosphere = b1.wonttalk + b2.comfortableatmosphere
+ b3.friendly
The Factor Analysis Model
Atmosphere = b1.wonttalk + b2.comfortableatmosphere
+ b3.friendly
z
Suppose that the betas in the above factor
model are b1=0.3, b2=0.4, and b3=0.8
Factor scores for each respondent are calculated
as follows:
The Factor Analysis Model
Resp. V1
wont
1
2
2
3
3
3
4
5
5
5
V2
comf
4
5
4
5
6
V3
frien
5
5
2
4
2
Atmosphere
(2*0.3)+(4*0.4)+(5*0.8)=6.2
(3*0.3)+(5*0.4)+(5*0.8)=6.9
etc
Steps in a Factor Analysis
Step 1
Step 2
Step 3
Step 4
Step 5
Step 6
Step 7
Step 8
Problem formulation
Construct correlation matrix
Select method of factor analysis
Determine number of factors
Rotate factors
Interpret factors
Calculate factor scores
Determine model fit
Step 1 Problem formulation
The first step involves:
z Deciding on the purpose of the analysis
z Selecting a suitable scale on which to base your
factor analysis eg Likert or Semantic Differential
Scales
z In a typical questionnaire between 7 and 25
questions are asked
z The sample size should be at least 4-5 times the
number of variables
Step 2 Construct correlation matrix
The next step involves checking whether the variables
are sufficiently correlated to warrant the use of factor
analysis.
This involves examining the bivariate correlation matrix
for large and significant correlations
Formal statistics can also be used to test the
appropriateness of using factor analysis.
Step 2 Construct correlation matrix
z
Bartletts test of sphericity - this involves using a chisquare test to determine whether the variables are
uncorrelated. A large value of the test statistic implies
that the null hypothesis (ie no correlation) should be
rejected
The Kaiser-Meyer-Olkin indicator of sampling
adequacy should be larger than 0.6 for a factor analysis
to be appropriate
Examination of communalities also indicates how well
a factor analysis is performing. These indicate the
variance of each variable explained by the model .
Example: GP Surgery Choice
Questions:
z
Is it appropriate to do a factor analysis on this
data set?
Why?
Step 3 Select method of factor
analysis
Several common forms of factor analysis are:
1. Principal Components Analysis - represents the
data by defining principal components, which
is analogous to a regression curve.
This method is appropriate when you want to
find the minimum number of factors and you
want to use the factors in a subsequent analysis
Step 3 Select method of factor
analysis
2. Common Factor Analysis (or Principal axis
factoring)
Similar to PCS, except that the analysis is
primarily based on communalities rather than the
correlation matrix. Earliest form of factor
analysis.
This method is appropriate when the primary
concern is to identify the underlying factors
Step 3 Select method of factor
analysis
3. Alpha factoring - treats the number of
individuals as fixed, and the set of variables
chosen as a sample of the population of
variables. Emphasis on psychometric inference.
Factors are extracted so that they have maximum
correlation with those assumed to exist in the
universe
Useful method for scale development
Step 4 Determine number of
factors
The are several ways of determining the appropriate
number of factors:
1. A priori determination - based on theory or past
experience.
2. Use eigenvalues - eigenvalues represent the amount of
variance associated with a factor. Larger eigenvalues
indicate that each factor represents more of the
covariance between variables. Using this approach
factors with eigenvalues <1.0 are excluded
Step 4 Determine number of
factors
3. Use a scree plot - this is simply a plot of the
eigenvalues against the factors.
The plot is examined to see where there is a kink
in the curve ie where additional factors explain
little extra variance.
The tail of the plot is the scree - look for where
the scree begins.
Step 4 Determine number of
factors
4. Percentage of variance
z
Ideally a factor analysis should explain at least
60% of the cumulative variance
There is a case for selecting extra factors, if
necessary, to achieve this level of explanation
Example: GP surgery choice
What is the correct number of factors in this case:
Criteria 1: theory
Criteria 2: eigenvalues
Criteria 3: scree plot
Criteria 4: variance explained
Step 5 Rotate factors
z
An output from factor analysis is the factor
matrix. This matrix contains coefficients that
show the relationship between a factor and the
variables. If the coefficients are large, the factor
and the variable are closely related.
However it can be difficult to interpret factors,
as factors are typically correlated with many
variables.
Step 5 Rotate factors
z
Rotation is a method of transforming the factor
matrix to make it easier to interpret.
The most commonly used rotation is the varimax
procedure. This is an orthogonal procedure ie it
produces uncorrelated factors. It is therefore
very useful where subsequent analysis is
expected.
Step 5 Rotate factors
z
Other orthogonal rotations are Quartimax and
Equimax
They are both conceptually similar to Varimax
z
z
z
Varimax works by simplifying the columns in the
component matrix
Quartimax simplifies the columns
Equimax simplifies both columns and rows
Step 5 Rotate factors
z
Where factors are likely to be correlated it may
in appropriate to cause them to be uncorrelated
The results can be affected spuriously, creating
simplified results that are nave
Oblique rotations are available that clarify the
factors, but factors are not required to be
orthogonal
Note that the rotated component matrix for an
oblique rotation is called the pattern matrix
Step 5 Rotate factors
z
SPSS only provides one method of oblique
rotation, although others are available. This is
known as Oblimin.
Step 6 Interpret factors
Question: how do you know what underlying
construct a factor represents?
z
Examine the variables that have a high load on a
factor (ie > 0.4)
Try to give the factor a name based on these
variables
If a factor cannot be easily defined, it should be
labelled a general factor
Example: GP surgery choice
What variables have a high loading on each factor?
Do variables have a high loading on more than one
factor?
Define factor names using high loading variables
Step 7 Calculate factor scores
z
If the objective of the factor analysis is to reduce the
data for subsequent analysis, it is necessary to calculate
factor scores.
The factor score is simply the value of the factor for
each respondent. It is calculated using the previously
mentioned formula
Fi = b1X1 + b2X2 + + bnXn
z
The b coefficients are obtained from the factor score
coefficient matrix
Factor scores can be requested from SPSS
Step 8 Determine model fit
z
It is possible to estimate a correlation matrix using the
estimated correlations between the factors and variables
The model fit can then be tested by comparing (1) the
original correlation matrix with (2) the estimated
correlation matrix
The difference between observed and estimated
correlations are known as residuals
The existence of many large and significant residuals
provides evidence that the factor model is poor
Summary
z
Factor analysis is used to:
z
identify a set of underlying factors or constructs that
describe respondents attitudes towards a product
to develop a set of uncorrelated variables for use in
subsequent statistical analysis
Factor analysis is used in marketing research to
determine product attributes, for regression
analysis and for market segmentation
Summary
z
The appropriateness of using factor analysis can
be checked by looking at the correlation matrix
of using Bartletts test of sphericity
Principal components analysis is used if the
objective is to use the factors in a subsequent
analysis
The four ways of selecting the number of
factors: theory, eigenvalues, scree plot and
explained variance
Summary
z
Factors are rotated to make them easier to
interpret
Varimax produces uncorrelated factors, Oblimin
produces correlated factors
Interpret factors by looking at variables with
high loadings
Factor scores are calculated using the factor
score correlation matrix
Residuals are used to test model fit