Part 3 - Exploratory Factor AnalysisFile
Part 3 - Exploratory Factor AnalysisFile
Research Methodology
2
Exploratory Factory Analysis
• When adding more variables, overlapping (correlation) between variables are more likely:
o Reduce the number of variables to be analyzed
o Select the appropriate variables to be included in the multivariate analyses
• Factor analysis provides the tools for analyzing the structure of the interrelationships
(correlations) among a large number of variables
• Factors or components: sets of variables that are highly interrelated
3
Exploratory Factory Analysis
• Factor analysis is different from the Principal Component Analysis (PCA) in terms of statistical
interpretation:
✓ PCA is a linear combination of variables;
✓ Factor Analysis is a measurement model of a latent variable.
• A retail firm identified 80 different characteristics of retail stores and their service that consumers
mentioned as influencing their patronage choice among stores
• Conduct a survey for consumer evaluations on each of the 80 specific items.
5
A Hypothetical Example: Conducting Survey
In-store experience
of shoppers
Product assortment
and availability
Product quality and
price levels 6
Factor Analysis Decision Process
7
Stage 1: Objectives of Factor Analysis
8
Stage 1: Objectives of Factor Analysis
R Factor Analysis
• Research objective: summarize the characteristics (i.e., variables)
• Factor analysis would be applied to a correlation matrix of the variables
• Analyze a set of variables to identify the dimensions for the variables that are latent (not easily
observed).
Q Factor Analysis
• Factor analysis would be applied to a correlation matrix of the individual cases
• Combines or condenses large numbers of cases into distinctly different groups within a larger population
• Not utilized frequently because of computational difficulties
• Instead, most researchers utilize some type of cluster analysis to group individual cases.
9
Stage 1: Objectives of Factor Analysis
Data Summarization
• Derives underlying dimensions that describe the data in a much smaller number of concepts than the
original individual variables
• The fundamental concept involved is the definition of structure, through which researchers can view the
set of variables at various levels of generalization
• Variate: the linear composite of variables
• In EFA, the variates (factors) are formed to maximize their explanation of the entire variable set, not to
predict a dependent variable(s)
• The goal of data summarization is achieved by defining a small number of factors that adequately
represent the original set of variables.
10
Stage 1: Objectives of Factor Analysis
Data Summarization
• Without interpretation:
o Based solely on the intercorrelations between variables
o Principal components regression employ this procedure strictly to reduce the number of variables in the
analysis with no specific regard to interpretability
o Create a parsimonious set of variables that aid with model estimation and development
• With interpretation:
o For managerial purposes or utilize the procedure to assist in scale development
o Scale development: A specific process focused on the identification of a set of items that represent a
construct (e.g., store image) in a quantifiable and objective manner
o Common factor analysis is most applicable to interpretability since it analyzes only the shared variation
among the variables
11
Stage 1: Objectives of Factor Analysis
Data Reduction
Purpose:
• To retain the nature and character of the original variables, but reduce the number of actual values
included in the analysis
Techniques:
(1) Identify representative variables from the much larger set of variables represented by each factor for use
in subsequent multivariate analyses, or
(2) Create an entirely new set of variables, representing composites of the variables represented by each
factor, to replace the original set of variables.
When combine using the result of data summarization, loadings and factor scores help to identify the final
set of variables.
12
Stage 1: Objectives of Factor Analysis
Variable Selection
• Variable Specification
• Factors are always produced
• Factors require multiple variables
13
Stage 1: Objectives of Factor Analysis
• Highly correlated variables affect the incremental predictive power of sequential variables
entered in multiple regression and discriminant analysis
• EFA addressed the intercorrelations by adding new subset of variables, reducing the
dimensions and increasing interpretability
14
Stage 2: Designing an Exploratory Factor Analysis
1. Design of the study in terms of the number of variables, measurement properties of variables, and the
types of allowable variables
2. The sample size necessary, both in absolute terms and as a function of the number of variables in the
analysis
3. Calculation of the input data (a correlation matrix) to meet the specified objectives of grouping variables
or respondents.
15
Stage 2: Designing an Exploratory Factor Analysis
• Include several and reasonable number of variables that represent each proposed factor
• Identify key variables that closely reflect the hypothesized underlying factors.
16
Stage 2: Designing an Exploratory Factor Analysis
Sample Size
• Communality - the amount of a variable’s variance explained by its loadings on the factors, calculated as the
sum of the squared loadings across the factors.
17
Stage 2: Designing an Exploratory Factor Analysis
• Identify groups or clusters of individuals that exhibit a similar pattern on the variables included in
the analysis
18
Stage 2: Designing an Exploratory Factor Analysis
19
Stage 3: Assumptions in Exploratory Factor Analysis
Assumption: Some underlying structure does exist in the set of selected variables
• Just because the variables are statistically correlated doesn't mean they are conceptually relevant
• Ensure that the observed patterns are conceptually valid
• Ensure that the sample is homogeneous with respect to the underlying factor structure
• Example:
o Mixing independent and dependent variables in a single factor analysis, then using that factor to
explain the dependence relationship is not appropriate
o If males and females respond differently to a set of items, analyzing them together may create
misleading results. Instead, separate analyses should be done for each group to understand their
unique patterns, then compare the results.
20
Stage 3: Assumptions in Exploratory Factor Analysis
21
Stage 3: Assumptions in Exploratory Factor Analysis
22
Stage 3: Assumptions in Exploratory Factor Analysis
• The process start with deleting the variable that has MSA that is below .50, then recalculate the
factor analysis. Repeat the same process until all unit have acceptable MSA value.
• This is done because low-MSA-value variables are often end up as single variable factor
23
Stage 4: Deriving Factors and Assessing overall Fit
• Common variance - Variance in a variable that is shared with all other variables in the analysis
• Unique variance - Variance associated with only a specific variable and is not represented in the
correlations among variables
• Variables with high common variance are more amenable to exploratory factor analysis
• Specific variance - Reflects the unique characteristics of that variable apart from the other variables in
the analysis
• Error variance - Variance that is due to unreliability in the data-gathering process, measurement error,
or a random component in the measured phenomenon.
• Total variance – has two basic sources, common and unique variance (with 2 subparts - specific and
error variance)
24
Stage 4: Deriving Factors and Assessing overall Fit
Principal Component Analysis - Variance in a variable that is shared with all other variables in the
analysis
Common Factor Analysis - Variance associated with only a specific variable and is not represented in the
correlations among variables
Selection criteria:
1. The objectives of the factor analysis
2. The amount of prior knowledge about the variance in the variables.
25
Stage 4: Deriving Factors and Assessing overall Fit
26
Stage 4: Deriving Factors and Assessing overall Fit
27
Stage 4: Deriving Factors and Assessing overall Fit
Common Factor Analysis suffers from factor indeterminacy and invalid communalities (less than 0 or
above 1) that requires deletion of the variable.
Empirical research showed identical results for most applications using the 2 methods.
28
Stage 4: Deriving Factors and Assessing overall Fit
29
Stage 4: Deriving Factors and Assessing overall Fit
• Percentage of Variance criterion – Retains factors until a specific cumulative percentage of total
variance is explained.
o In natural sciences: Should not be stopped until extracted factors account for at least 95% of the
variance
o In social sciences: considers solutions that account for 60% of the total variance
o Variant approach: selecting enough factors to achieve a prespecified communality for each of the
variables
• Scree Test criterion
o Identify the optimum number of factors that can be extracted before the amount of unique variance
begins to dominate the common variance structure
o “Elbow”: the point where the unique variance begins to dominate the common variance structure.
30
Stage 4: Deriving Factors and Assessing overall Fit
31
Stage 4: Deriving Factors and Assessing overall Fit
• Parallel Analysis
o Form a stopping rule based on the specific characteristics
o Generate a large number of datasets from random values and then factor analyze each of the
dataset
o Result: average eigenvalue for the first factor, the second factor and so on across the simulated
datasets.
o Compare the eigenvalues of the original data with the simulated datasets → retain factors that have
eigenvalues above the eigenvalues of simulated datasets.
o Variant: taking the upper bound (95th percentile) of the simulated eigenvalues as a more
conservative threshold
o With PCA, Parallel Analysis tend to be “stricter” than Latent Root criterion as the number of factors
increases; with CFA, the opposite has been observed.
32
Stage 4: Deriving Factors and Assessing overall Fit
33
Stage 5: Interpreting the Factors
2. Factor Rotation
• Rotational methods are employed to achieve simpler and theoretically more meaningful factor solutions
• Rotation of the factors improves the interpretation by reducing some of the ambiguities that often
accompany initial unrotated factor solutions
34
Stage 5: Interpreting the Factors
35
Stage 5: Interpreting the Factors
Factor Rotation
Redistribute the variance from earlier factors to later ones to achieve a simpler, theoretically more meaningful
factor pattern
Factor Rotation
The clustering between two groups is more obvious: V1, V2 and V3, V4, V5.
37
Stage 5: Interpreting the Factors
38
Stage 5: Interpreting the Factors
39
Stage 5: Interpreting the Factors
40
Stage 5: Interpreting the Factors
41
Stage 5: Interpreting the Factors
42
Stage 5: Interpreting the Factors
43
Stage 5: Interpreting the Factors
44
Stage 5: Interpreting the Factors
45
Stage 5: Interpreting the Factors
46
Stage 5: Interpreting the Factors
47
Stage 5: Interpreting the Factors
48
Stage 5: Interpreting the Factors
49
Stage 5: Interpreting the Factors
Example
• Nine measures were obtained in a pilot test based on a sample of 202 respondents
• Through prior research, 3 factor solution was deemed appropriate
• Task: interpreting the factors
50
Stage 5: Interpreting the Factors
51
Stage 5: Interpreting the Factors
Example – Step 3
52
Stage 5: Interpreting the Factors
Step 4:
• Factor 1: V7, V8, V9
• Factor 2: V2, V3, V5
• Factor 3: V4, V6.
• Due to cross-loading, V1 is eliminated
and the loadings are recalculated
Step 5: Well-defined three distinct group of
variables
53
Stage 6: Validation of exploratory Factor Analysis
54
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
• Surrogate variable: variable with the highest factor loading on each factor
• Disadvantages of choosing a surrogate representative for a factor or component:
o Does not address the issue of measurement error encountered when using single measures
o Runs the risk of potentially misleading results by selecting only a single variable to represent a perhaps
more complex result
55
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
• Benefits:
o Reducing measurement error – the extent to which observed values deviate from actual values
due to errors (e.g., data entry) or individuals' inability to provide accurate information
- Multiply indicators, reduce the reliance of a single response
o Represent Multiple Aspects of a Concept in a Single Measure - Combine the multiple
indicators into a single measure representing what is held in common across the set of measures
56
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
• Conceptual definition - defining the concept being represented in terms applicable to the
research context
o Content validity – assessment of the correspondence of the variables to be included in a
summated scale and its conceptual definition
• Dimensionality – items are unidimensional, strongly associated with each other and represent a
single concept
57
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
58
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
• Construct validity – extent to which a scale or set of measures accurately represents the concept
of interest
o Convergent validity – degree to which two measures of the same concept are correlated
o Discriminant validity – degree to which two conceptually similar concepts are distinct
o Nomological validity – degree that the summated scale makes accurate predictions of other
concepts in a theoretically-based model
59
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
Calculation
• The items with high loadings are summed or averaged, with averaging being the most common
approach
• Negative loadings are reverse scored
Example: V1 - positive loading and V2 - negative loading
• First case: V1 – 10, V2 – 0
• Second case: V1 – 0, V2 – 10
• Scale score (non-reversed): V1 – 10, V2 – 10
• Scale score (reversed): V1 – 20, V2 – 0
60
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
• Factor score: the degree to which each individual scores high on the group of items with high
loadings on a factor
• Computed-based on the factor loadings of all variables on the factor, whereas the summated scale
is calculated by combining only selected variables
• Disadvantage: not easily replicated across studies
• Scoring procedure saves the scoring coefficients from the factor matrix and then allows them to
be applied to new datasets.
• SAS: PROC SCORE
61
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
Method Selection
• If data are used only in the original sample, interpretation is less important or orthogonality must be
maintained, factor scores are suitable
• If generalizability or transferability is desired, then summated scales or surrogate variables are more
appropriate.
• If the summated scale is untested and exploratory, with little or no evidence of reliability or validity,
surrogate variables should be considered if additional analysis is not possible to improve the summated
scale
62
Stage 7: Data Reduction—Additional Uses of Exploratory
Factor Analysis Results
63
Practice with Stata
factortest item13-item24
64
Practice with Stata
65
Practice with Stata
66
Practice with Stata
estat common
67
Practice with Stata
estat common
screeplot
68
Practice with Stata
estat common
screeplot
69
Practice with JASP
DATASET DESCRIPTION
• HBAT sells paper products to two market segments: the newsprint industry and the magazine industry.
Also, paper products are sold to these market segments either directly to the customer or indirectly
through a broker. Two types of information were collected in the surveys. The first type of information was
perceptions of HBAT’s performance on 13 attributes. These attributes, developed through focus groups, a
pretest, and use in previous studies, are considered to be the most influential in the selection of suppliers
in the paper industry.
• Respondents included purchasing managers of firms buying from HBAT, and they rated HBAT on each of
the 13 attributes using a 0–10 scale, with 10 being “Excellent” and 0 being “Poor.” The second type of
information relates to purchase outcomes and business relationships (e.g., satisfaction with HBAT and
whether the firm would consider a strategic alliance/partnership with HBAT). A third type of information is
available from HBAT’s data warehouse and includes information such as size of customer and length of
purchase relationship.
• By analyzing the data, HBAT can develop a better understanding of both the characteristics of its
customers and the relationships between their perceptions of HBAT, and their actions toward HBAT (e.g.,
satisfaction and likelihood to recommend). From this understanding of its customers, HBAT will be in a
70
good position to develop its marketing plan for next year.
Practice with JASP
Variable Description
X1 Customer Type Length of time a particular customer has been buying from HBAT: 1 = less than 1 year, 2 = between 1 and 5 years, 3 = longer than 5 years
X2 Industry Type Type of industry that purchases HBAT’s paper products: 0 = magazine industry, 1 = newsprint industry
X3 Firm Size Employee size: 0 = small firm (fewer than 500 employees), 1 = large firm (500 or more employees)
X4 Region Customer location: 0 = USA/North America, 1 = outside North America
X5 Distribution System How paper products are sold to customers: 0 = sold indirectly through a broker, 1 = sold directly
X6 Product Quality Perceived level of quality of HBAT's paper products
X7 E-Commerce Activities/Website Overall image of HBAT’s website, especially user-friendliness
X8 Technical Support Extent to which technical support is offered to help solve product/service issues
X9 Complaint Resolution Extent to which complaints are resolved in a timely and complete manner
X10 Advertising Perceptions of HBAT's advertising campaigns in all types of media
X11 Product Line Depth and breadth of HBAT's product line to meet customer needs
X12 Salesforce Image Overall image of HBAT's salesforce
X13 Competitive Pricing Extent to which HBAT offers competitive prices
X14 Warranty and Claims Extent to which HBAT stands behind its product/service warranties and claims
X15 New Products Extent to which HBAT develops and sells new products
X16 Ordering and Billing Perception that ordering and billing are handled efficiently and correctly
X17 Price Flexibility Perceived willingness of HBAT sales reps to negotiate prices on paper product purchases
X18 Delivery Speed Amount of time it takes to deliver the paper products once an order has been confirmed
X19 Customer Satisfaction Customer satisfaction with past purchases from HBAT, measured on a 10-point graphic rating scale
X20 Likelihood of Recommending HBAT Likelihood of recommending HBAT as a supplier of paper products, measured on a 10-point graphic rating scale
X21 Likelihood of Future Purchases from HBAT Likelihood of purchasing paper products from HBAT in the future, measured on a 10-point graphic rating scale
X22 Percentage of Purchases from HBAT Percentage of the responding firm’s paper needs purchased from HBAT, measured on a 100-point percentage scale
X23 Perception of Future Relationship with HBAT Extent to which the customer/respondent perceives a future strategic alliance/partnership with HBAT (0 = Would not consider, 1 = Would consider)
71
Vietnam
https://future.ueh.edu.