Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views89 pages

Part 3 Structural Equation ModellingFile

The document discusses Structural Equation Modelling (SEM), a statistical technique used to analyze relationships among multiple variables, including Confirmatory Factor Analysis (CFA) and Partial Least Squares (PLS). It outlines the importance of latent variables, model definition, and the role of theory in establishing causation, emphasizing the need for a theoretical basis in SEM. Additionally, it describes the stages of SEM, including defining constructs, developing measurement models, and assessing model validity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views89 pages

Part 3 Structural Equation ModellingFile

The document discusses Structural Equation Modelling (SEM), a statistical technique used to analyze relationships among multiple variables, including Confirmatory Factor Analysis (CFA) and Partial Least Squares (PLS). It outlines the importance of latent variables, model definition, and the role of theory in establishing causation, emphasizing the need for a theoretical basis in SEM. Additionally, it describes the stages of SEM, including defining constructs, developing measurement models, and assessing model validity.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

PHƯƠNG PHÁP NGHIÊN CỨU

Research Methodology

Part 3: Structural Equation Modelling

Pham Thai Binh, Ph.D


UEH, Sep 2022
DEFINITION

• Structural Equation Modelling (SEM) is a family of statistical models that seeks to explain the
relationships among multiple variables.
• Confirmatory Factor Analysis (CFA) is a way of testing how well a prespecified measurement theory
composed of measured variables and factors fits reality as captured by data.
• Partial least squares structural equation modeling (PLS) is a combination of interdependence and
dependence techniques

Dr. Binh Pham


Structural Equation Modelling

• Involve:
1. Simultaneous estimation of multiple and interrelated dependence relationships
2. An ability to represent unobserved concepts in these relationships and account for measurement
error in the estimation process
3. Defining a theoretical model to explain the entire set of relationships
4. Over-identifying assumptions (meaning variables are explained by a unique set of variables that
does not include all possible relationships)

Dr. Binh Pham


Structural Equation Modelling

• Use separate relationships for each of a set of dependent variables, which is different from other
multivariate techniques
• Dependent variables in one relationship can become independent variables in subsequent relationships
• The structural model expresses the dependence relationships between these independent and
dependent variables.
• Reduced form equation: solves for a single endogenous construct (dependent variable) in a single
equation with all and only exogenous constructs (independent variables)
• Structural equation: parsimonious and include only the specific predictors, endogenous or exogenous,
that are theoretically linked to the outcome construct
• SEM uses structural equation

Dr. Binh Pham


Structural Equation Modelling

Latent variables
• Latent construct: hypothetical, unobserved concept that can be represented by observed or
measured variables
• Measured indirectly by examining multiple measured variables (manifest variables or indicators)

Benefits of latent construct – Representing theoretical concepts


• Through latent construct, theoretical concepts can be represented by multiple indicators, which
reduces the measurement error over relying on a single indicator.
• Measurement model: specifies the relationships of the indicators and the latent construct.
• Test the fit of the theoretical measurement model to reality to identify degree of measurement
error present.

Dr. Binh Pham


Structural Equation Modelling

Latent variables
Example: HBAT would like to identify job satisfaction in employees
• Outcome: Job satisfaction level
• Independent variables: How they feel about their supervisor & how they like their work
environment
• Indicators of “how they feel about their supervisor”:
(1) My supervisor recognizes my potential
(2) My supervisor helps me resolve problems at work
(3) My supervisor understands that I have a life away from work
• With SEM, the contribution of each indicator to its construct and the combined indicators'
representation of the construct (reliability and validity) is assessed.

Dr. Binh Pham


Structural Equation Modelling

Latent variables
Benefits of latent construct – Improving Statistical Estimation
• Reliability: a measure of the degree to which a set of indicators of a latent construct is internally
consistent based on how highly interrelated the indicators are with each other.
o Inversely related to measurement error

𝛽𝑦.𝑥 : observed regression coefficient


𝛽𝑠 : the true structural coefficient
𝜌𝑥 : the reliability of the predictor variable

Dr. Binh Pham


Structural Equation Modelling

Latent variables
Benefits of latent construct – Improving Statistical Estimation
• Unless the reliability is 100 percent (i.e., no measurement error), the observed correlation will
always understate the true relationship
• SEM makes an estimate of the true structural coefficient 𝛽𝑠 based on the estimated regression
coefficient, or accounts for the measurement error in the estimated relationship
• High reliability does not guarantee that a construct is measured accurately

Dr. Binh Pham


Structural Equation Modelling

Exogenous vs. Endogenous Latent Construct


• Exogenous constructs: latent, multi-item, equivalent of independent variables
o Does not have any paths (single-headed arrows) from any other construct or variable going
into it
• Endogenous constructs: latent, multi-item equivalent to dependent variables
o The dependence is represented visually by a path to an endogenous construct from an
exogenous construct (or from another endogenous construct).

Dr. Binh Pham


Structural Equation Modelling

Defining a Model
• Model: Representation of a theory
• Theory: A systematic set of relationships providing a consistent and comprehensive explanation
of phenomena
• A model should not be developed without some underlying theory
• A path diagram is often used to portray a model

Dr. Binh Pham


Structural Equation Modelling

Defining a Model
• Measurement relationship: type of dependence relationship (depicted by a straight arrow)
between measured variables and constructs
• Structural relationship: 2 types of relationship among constructs - dependence relationships
and correlational (covariance) relationships.
• Dependence relationships can determine whether the construct is an exogenous or endogenous
construct
• Only a dependence relationship can exist between exogenous and endogenous constructs, not
a correlational relationship.
• The model fit is determined by the similarity between the observed covariance matrix and an
estimated covariance matrix

Dr. Binh Pham


Structural Equation Modelling

Defining a Model
• Constructs: Ovals or circles
• Measured variables: Squares or
rectangles

Dr. Binh Pham


Structural Equation Modelling

Defining a Model

Dr. Binh Pham


SEM and Other Multivariate Techniques

Similarity to Dependence Techniques


• Relationships for each individual endogenous construct can be expressed in a regression
equation
• The endogenous construct is the dependent variable, and the independent variables are the
constructs, with arrows pointing to the endogenous construct.
• Difference:
o In SEM, a construct that acts as an independent variable in one relation ship can be the
dependent variable in another relationship
o Allows for simultaneous estimation of all the relationships/equations

Dr. Binh Pham


SEM and Other Multivariate Techniques

Similarity to Interdependence Techniques


• Similar to EFA where variables load on factors
• Difference:
o SEM is the opposite of EFA
o The priori research specifies which variables are associated with each construct, whereas
EFA searches for structure among variables by defining factors.

Dr. Binh Pham


SEM and Other Multivariate Techniques

The Emergence of SEM


• SEM traces its roots back to the early 20th century, with early interests from genetics and
economics researchers
• During the late 60s and 70s, Jöreskog and Sörbom work led to simultaneous maximum
likelihood estimation of a theory represented by relationships between latent constructs and
measured indicator variables and among latent constructs (and the corresponding lack of
relationships) and culminated LISREL
• Today, SEM is “the dominant multivariate technique,” followed by multiple regression, cluster
analysis and MANOVA
• In the 1970s, PLS-SEM was introduced and referred to as “soft modeling” because it does not
require normally distrib uted data and performs well even when the data are highly skewed
• The first user-friendly PLS-SEM software was PLSGraph, another recently developed is
SmartPLS 2.

Dr. Binh Pham


The Role of Theory in SEM

Specifying relationships
• SEM is considered a confirmatory approach
• A theoretical basis is critical in SEM:
o To specify what and how things are related and are not related to each other in both measurement
and structural models
o The relationships in a path diagram typically involve a combination of dependence and correlational
relationships among exogenous and endogenous constructs. Any concepts not connected are
theorized to be independent.

Dr. Binh Pham


The Role of Theory in SEM

intercorrelation

Reciprocal relationship
Dr. Binh Pham
The Role of Theory in SEM

Establishing Causation
• Causal inference: involves a hypothesized cause-and-effect relationship
• SEM alone cannot establish causality
• SEM can treat dependence relationships as causal if four types of evidence are reflected in the
model:
o Covariation
o Sequence
o Nonspurious covariation
o Theoretical support

Dr. Binh Pham


The Role of Theory in SEM

Establishing Causation
Covariation
• Systematic covariance (correlation) between the cause and effect is necessary, but not sufficient
Sequence
• SEM cannot provide this type of evidence without a research design that involves either an
experiment or longitudinal data
Nonspurious covariance
• The size and nature of the relationship between a true cause and the relevant effect should not
be affected by including other constructs (or variables) in a model
• Multicollinearity in multiple predictor constructs makes causal inference less certain\
Theoretical support
• A compelling rationale to support a cause-and-effect relationship
Dr. Binh Pham
The Role of Theory in SEM

Developing a Modeling Strategy


• Confirmatory Modeling Strategy: Specify a specific theoretical model composed of a pattern
of relationships and nonrelationships → SEM assesses how well the model fits reality
• Competing Models Strategy: Compare one plausible theoretical estimated model with
alternative theories by assessing relative fit
• Model Development Strategy: Improve a proposed basic model framework through
modifications of the structural or measurement models

Dr. Binh Pham


A Simple Example

2 key research questions:


(1) What factors influence job satisfaction?
(2) Is job satisfaction related to employees’ likelihood of looking for another job (i.e., quitting their
present job)?

Four relationships form how HBAT management believes to reduce the likelihood of employee leaving the
company:
• Improved supervision leads to higher job satisfaction
• Better work environment leads to higher job satisfaction
• More favorable perceptions of coworkers lead to higher job satisfaction
• Higher job satisfaction consequently leads to lower likelihood of job search.

Dr. Binh Pham


A Simple Example

1. Constructs are identified as exogenous or endogenous


2. The theoretical process can be portrayed visually in a path diagram

Job Satisfaction is both a dependent


and independent variable

Dr. Binh Pham


A Simple Example

Basics of SEM Estimation and Assessment


Observed Covariance Matrix

Dr. Binh Pham


A Simple Example

Basics of SEM Estimation and Assessment


Estimating and Interpreting Relationships

• Path analysis uses bivariate correlations to estimate the relationships in a system of structural
equations
• This process estimates the strength of each structural relationship (a straight or curved arrow) in a
path diagram
• SEM computes all estimated relationships simultaneously, also providing the estimates of
intercorrelations between exogenous constructs, which is useful for result interpretation.
• The estimates are statistically tested for significance, and the estimates can be used as regression
coefficients to make estimates of the values of any construct in the model

Dr. Binh Pham


A Simple Example

Basics of SEM Estimation and Assessment


Estimating and Interpreting Relationships

Direct path: Work Environment → Job Satisfaction =.219


Indirect paths:
• Work Environment → Supervision → Job Satisfaction = .200 × .065 = .013
• Work Environment → Coworkers → Job Satisfaction = .015 × .454 = .068

Dr. Binh Pham


A Simple Example

Basics of SEM Estimation and Assessment


Estimating and
Interpreting Relationships
Residuals reflected the errors in
predicting individual observations
(𝑦–
ො 𝑦)

Dr. Binh Pham


Six Stages in SEM

Stage 1 Defining individual constructs


Stage 2 Developing the overall measurement model
Stage 3 Designing a study to produce empirical results
Stage 4 Assessing the measurement model validity
Stage 5 Specifying the structural model
Stage 6 Assessing the structural model validity

Dr. Binh Pham


Stage 1: Defining Individual Constructs

Operationalizing a Construct

What items are to be used as measured variables?


• A researcher operationalizes a latent construct by selecting its measurement scale items and scale type
• In survey research, operationalizing a latent construct results in a series of scaled indicator items in a
common format such as a Likert scale or a semantic differential scale

Scale from prior research


• Literature search on the individual constructs and identify scales that previously performed well
New Scale Development
• Appropriate when studying something that does not have a rich history of previous research or
• When existing scales are inappropriate for a given context.

Dr. Binh Pham


Stage 1: Defining Individual Construct

Pretesting

• Should use respondents similar to those from the population to be studied so as to screen items
for appropriateness
• Important when scales are applied in specific contexts (e.g., purchase situations, industries, or
other instances where specificity is paramount) or in contexts outside their normal use
• Empirical testing of the pretest results is done in a manner identical to the final model
• Items that do not behave statistically as expected may need to be refined or deleted.

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

In this stage, each latent construct to be included in the model is defined and the measured indicator variables
(items) are assigned to the corresponding latent constructs.

• Unspecified loadings are set to zero as


they will not be estimated
• For example, no paths suggest correlations
among indicator variables’ error variances
or loadings of indicators on more than one
construct (cross-loadings).

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

SEM Notation

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

Creating the Measurement Model – Key Issues


1. Unidimensionality
• Unidimensional measures: a set of measured variables (indicators) can be explained by only one
underlying construct
• Each measured variable is hypothesized to relate to only a single construct when 2 or more constructs are
involved
• The existence of significant cross-loadings is evidence of a lack of construct validity, even if it leads to
significantly better fit
• Between-construct error covariances and within-construct error also threaten construct validity

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

Creating the Measurement Model – Key Issues


1. Unidimensionality

• e1 & e2: within-construct error covariance


• e4 & e7: between-construct error covariance
• Freely estimating the cross-loadings and error covariances
violates the assumption of good measurement model
• Measurement models that have hypothesized cross-loadings
set to zero and no correlated error variance are congeneric
measurement model.

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

Creating the Measurement Model – Key Issues


2. Items per Construct
• Good practice dictates a minimum of three items per factor, preferably four, to provide minimum coverage
of a construct’s theoretical domain and to provide adequate identification for the construct
• The covariance matrix provides the degrees of freedom used to estimate parameters in a CFA or SEM
model
• The number of unique variances/covariances:

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

Creating the Measurement Model – Key Issues

2. Items per Construct


• Three levels of identification:
1. Underidentified
• Has more parameters to be estimated than unique indicator
variable variances and covariances in the observed
variance/covariance matrix
2. Just Identified
• There are just enough degrees of freedom to estimate all free
parameters
• A model with zero degrees of freedom is referred to as
saturated

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

Creating the Measurement Model – Key Issues

2. Items per Construct


3. Overidentified
• Has more unique covariance and variance
terms than parameters to be estimated
• A solution can be found with positive degrees of
freedom and a useful chi-square goodness-of-fit
value
• The ideal objective when applying CFA and
SEM is to model overidentified constructs within
an overidentified measurement model.

Dr. Binh Pham


Stage 2: Developing and Specifying the Measurement Model

Creating the Measurement Model – Key Issues


3. Reflective vs. Formative Measurement
• Reflective measurement theory: is based on the idea that latent constructs cause the measured variables and that
the error results in an inability to fully explain these measured variables
• Formative measurement theory: assumes that measured variables cause, or actually form, a scale
• Key assumption: formative factors are not latent, instead are viewed as indices
• Each formative indicators is a cause of the index (e.g. educational level, occupational prestige, and income are
compositions of social class)
• Correlation among the indicators is not desirable since each indicator should have an independent effect
• High reliability or high AVE would provide evidence that a formative scale lacks validity
• A complete set of variables is needed to form a factor.

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issue in Research Design


1. Type of data to be analyzed, either covariances or correlations
• Metric vs. Nonmetric data
o Metric data: directly amenable to the calculation of covariances among items
o Advances in the software programs now allow for the use of many nonmetric data types (censored,
binary, ordinal, or nominal)
• Covariance vs. Correlation
o SEM was developed using covariance matrices. Many researchers also advocate the use of
correlational matrix as it is easier to interpret
o However, it is simple to request standardized solution from covariance input
o Using covariances is recommended for more flexibility and deeper insights, such as scaling, value
magnitude, and sample comparisons

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issue in Research Design


2. The impact and remedies for missing data
2 questions to answer:
Is the missing data sufficient and nonrandom so as to cause problems in estimation or interpretation?
If missing data must be remedied, what is the best approach?

• Must always be addressed if the missing data are in a nonrandom pattern or more than 10% of the
missing data items
• Missing completely at random (MCAR): pattern of missing data for a variable does not depend on any
other variable in the dataset or on the variable itself
• Missing at random (MAR): pattern of missing data for a variable is related to other variables, but not
related to its own values

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issue in Research Design


2. The impact and remedies for missing data
Remedies:
• Complete case approach - the observation is eliminated if missing data on any variable
• All-available approach - all non-missing data are used
• Imputation
• Model-based approach
o Maximum likelihood estimation of the missing values (ML)
o The EM approach
o Multiple Imputation

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issue in Research Design


2. The impact and remedies for missing data
If:
Missing data are random
< 10 % of observations
The factor loadings are relatively high (>=70%)
Any approach is appropriate
• All-available approach (pairwise deletion) is used most often when sample sizes > 250 and the total
amount of missing data is < 10%.
• FIML - uses all the information in a dataset to substitute missing data points. This approach has been
demonstrated to perform well compared to others
• When the amount of missing data becomes very high (15% or more), SEM may not be appropriate

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issue in Research Design


2. The impact and remedies for
missing data

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issue in Research Design


3. The impact of sample size
Larger sample sizes generally produce more stable solutions, particularly when problems exist
5 considerations affecting the required sample size:
(1) Multivariate normality of the data
(2) Estimation technique
(3) Model complexity
(4) The amount of missing data
(5) The average error variance among the reflective indicators

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issue in Research Design


3. The impact of sample size

Sample size should increase in Minimum Sample Model characteristics


Size
conditions:
Models containing five or fewer constructs, each with more than
(1) Data deviates substantially from 100 three items (observed variables), and with high item
multivariate normality communalities (.6 or higher)

(2) Sample-intensive estimation tech Models with seven constructs or less, at least modest
150 communalities (.5), and no underidentified constructs
niques (e.g., ADF) are used
(3) Missing data exceeds 10% Models with seven or fewer constructs, lower communalities
300 (below .45), and/or multiple underidentified (fewer than three)
constructs.
Models with large numbers of constructs, some with lower
500 communalities, and/or having fewer than three measured items.

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Specifying the model


“Setting the scale” of a latent factor:
• Fix one of the factor loadings on each construct to a specific value, typically 1
• Fix the value of the variance of the construct, typically 1
• Fixing a loading to “1” does not imply perfect association
• AMOS software automatically fixes, or constrains, one of the factor loading estimates to 1

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Issues in Identification
• Order condition: the degrees of freedom for a model be greater than zero
• Rank condition: requirement that each parameter be estimated by a unique relationship, difficult to
diagnose a violation
Sources and Remedies
• Incorrect Indicator Specification – Carefully examine the model specification
• “Setting the scale”
• Too few degrees of freedom
o Include enough measures
o Fix the error variances to a known or specified value
o Fix theoretical value to correlation between constructs

Dr. Binh Pham


Stage 3: Designing a Study to Produce Empirical Results

Problems in Estimation
• Illogical Standardized Parameters: Correlation estimates (i.e., standardized estimates) between
constructs exceed |1.0| or standardized path coefficients exceed |1.0|
• Heywood Case - A SEM solution that produces an error variance estimate of less than zero
o The first solution should be to ensure construct validity
o Try and add more items if possible or assume tau equivalence (all loadings in that construct are equal)
o “Last resort” solution, which is to fix the offending estimate to a very small value, such as .005

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Is the measurement model valid?


Depends on:
(1) Establishing acceptable levels of goodness-of-fit for the measurement model (fit validity)
(2) Finding other specific evidence of construct validity.
• Goodness-of-Fit (GOF): indicates how well the user-specified model mathematically reproduces the
observed covariance matrix among the indicator items (i.e., the similarity of the observed and estimated
covariance matrices)
• Construct validity: the extent to which a sets of measured items accurately reflect the theoretical latent
constructs they are designed to measure

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Goodness-of-Fit (GOF)
Chi-square (𝒙𝟐 ) GOF

N: sample size
Degree of freedom (df)

number of unique variances/covariances

Small 𝑥 2 p-value (<0.05) and larger 𝑥 2 value indicate lack of fit

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Absolute Fit Indices


Goodness-of-Fit Index (GFI)
• No statistical test is associated with the GFI, only guidelines to fit
• GFI values of greater than .90 typically were considered good, preferably 0.95
Root Mean Square Error of Approximation (RMSEA)
• Correct for the tendency of the 𝑥 2 GOF test statistic to reject models with a large samples or a large number of
observed variables
• Lower RMSEA values indicate better fit
Root Mean Square Residual (RMR) and Standardized Root Mean Residual (SRMR)
• RMR is computed by the square root of the mean of squared residuals
• SRMR is calculated using standardized residuals, thereby neutralizing the influence of differing scales across
various indicators within the model
• SRMR > .1 suggests a problem with fit
Normed chi-square - simple ratio of 𝑥 2 to the degrees of freedom for a model

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Incremental Fit Indices


Normed Fit Index (NFI)
• It is a ratio of the difference in the 𝑥 2 value for the fitted model and a null model divided by the 𝑥 2 value for the null
model
• Ranges between 0 and 1, and a model with perfect fit would produce an NFI of 1
• Artificially inflated estimate of model fit in more complex models
Tucker Lewis Index (TLI)
• It is a comparison of the normed chi-square values for the null and specified model
• TLI is not normed, and thus its values can fall below 0 or above 1
Comparative Fit Index (CFI)
• Improved version of the NFI
• Ranges between 0 and 1, with higher values indicate better fit
Relative Non-centrality Index (RNI)
• Compares the observed fit resulting from testing a specified model to that of a null model
• Ranges between 0 and 1, with higher values indicate better fit
Dr. Binh Pham
Stage 4: Assessing Measurement Model Validity

Parsimonious Fit Indices


Adjusted Goodness of Fit Indices (AGFI)
• Tries to take into account differing degrees of model complexity by adjusting GFI by a ratio of the degrees of
freedom
• Penalizes more complex models and favors those with a minimum number of free paths
Parsimony Normed Fit Index (PNFI)
• Adjusts the normed fit index (NFI) by multiplying it times the parsimony ratio
• The values of the PNFI are meant to be used in comparing one model to another

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Construct Validity
Convergent Validity: Indicators of a specific construct should converge or share a high proportion of variance in common
Factor loadings:
• All factor loadings should be statistically significant
• With large samples, standardized loading estimates should be >= .5, and ideally >= .7
σ𝑛 2
𝑖=1 𝐿𝑖
Average Variance Extracted - 𝐴𝑉𝐸 =
𝑛
• 𝐿𝑖 : the completely standardized factor loading for the ith measured variable
• n: the number of item indicators
• >= .5: suggests adequate convergence
• <.5: more error remains in the items than variance held in common with the latent factor upon which they load
2
σ𝑛
𝑖=1 𝐿𝑖
Reliability - 𝐶𝑅 = 2
σ𝑛
𝑖=1 𝐿𝑖 +(σ𝑛
𝑖=1 𝑒𝑖 )

• an indicator of convergent validity


• 𝐿𝑖 : squared sum of factor loadings; 𝑒𝑖 : sum of the error variance terms
• High construct reliability indicates that internal consistency exists >= |.70|
Dr. Binh Pham
Stage 4: Assessing Measurement Model Validity

Construct Validity
Convergent Validity: Indicators of a specific construct should converge or share a high proportion of variance in common
Factor loadings:
• All factor loadings should be statistically significant
• With large samples, standardized loading estimates should be >= .5, and ideally >= .7
σ𝑛 2
𝑖=1 𝐿𝑖
Average Variance Extracted - 𝐴𝑉𝐸 =
𝑛
• 𝐿𝑖 : the completely standardized factor loading for the ith measured variable
• n: the number of item indicators
• >= .5: suggests adequate convergence
• <.5: more error remains in the items than variance held in common with the latent factor upon which they load
2
σ𝑛
𝑖=1 𝐿𝑖
Reliability - 𝐶𝑅 = 2
σ𝑛
𝑖=1 𝐿𝑖 +(σ𝑛
𝑖=1 𝑒𝑖 )

• an indicator of convergent validity


• 𝐿𝑖 : squared sum of factor loadings; 𝑒𝑖 : sum of the error variance terms
• High construct reliability indicates that internal consistency exists >= |.70|
Dr. Binh Pham
Stage 4: Assessing Measurement Model Validity

Discriminant Validity
Discriminant validity: the extent to which a construct or variable is truly distinct from other constructs or variables
• First, the correlation between any two constructs can be specified (fixed) as 1
• Then, test a model with the specification of all items for both constructs on a single factor and compare its fit to
the fit of the original two-factor model
• Significantly difference in model fits would suggest that the eight items better represent two separate constructs
• A more rigorous test is to compare the AVE values with the square of the correlation estimate between two
constructs
o AVE > square of correlation estimate: good evidence of discriminant validity
• The presence of cross-loadings indicates a discriminant validity problem

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Nomological Validity and Face Validity


Discriminant validity: the extent to which a construct or variable is truly distinct from other constructs or variables
• First, the correlation between any two constructs can be specified (fixed) as 1
• Then, test a model with the specification of all items for both constructs on a single factor and compare its fit to
the fit of the original two-factor model
• Significantly difference in model fits would suggest that the eight items better represent two separate constructs
• A more rigorous test is to compare the AVE values with the square of the correlation estimate between two
constructs
o AVE > square of correlation estimate: good evidence of discriminant validity
• The presence of cross-loadings indicates a discriminant validity problem

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Model Diagnositics
Standardized Residuals
• Residuals: individual differences between observed covariance terms and the fitted (estimated) covariance terms
𝑟𝑎𝑤 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠
• Standardized residuals =
𝑆.𝐸.(𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙𝑠)

• Residuals < |2.5|: do not suggest a problem in a model of moderate or high complexity
• Residuals > |4.0|: indicate a potentially unacceptable degree of error
• Residuals between |2.5| - |4.0|: deserves attention, but may not suggest any changes to the model if no other problems
are associated

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Model Diagnositics
Modification Indices
• A modification index is calculated for every possible relationship that is not estimated in a model
• Show much the overall model 𝑥 2 value would be reduced by also estimating a loading for X1 to a different
construct that is not estimated
• Modification indices of approximately 4.0 or greater suggest that the fit could be improved significantly by
freeing the corresponding path to be estimated
• However, it is not recommended as would be inconsistent with the theoretical basis of CFA and SEM
• It is important for identifying problematic indicator variables if they exhibit the potential for cross-loadings

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Specification Searches
• Specification search: empirical trial-and-error approach that uses model diagnostics to suggest changes in the
model
• They identify the set of “new” relationships that best improve the overall model fit, based on freeing non-
estimated relationships with the largest modification index
• Problem:
o Inconsistency with the intended purpose and use of procedures such as CFA
o The interdependence between parameters makes it difficult to be certain that the true problem is isolated in
the variables suggested by a modification index
o Empirical research using simulated data has shown that mechanical specification searches are unreliable in
identifying a true model
• Thus, new construct structures suggested by specification searches must be confirmed using a new dataset

Dr. Binh Pham


Stage 5: Specifying the Structural Model

Involves specifying the structural model by assigning relationships from one construct to another based on the proposed
theoretical model

Dr. Binh Pham


Stage 5: Specifying the Structural Model

Dr. Binh Pham


Stage 6: Assessing the Structural Model Validity

Comparing Nested Models


Chi-square difference statistic (𝚫𝒙𝟐 )

The 𝑥 2 value from some baseline model (B) is subtracted from the 𝑥 2 value of a lesser constrained,
alternative nested model (A)

Dr. Binh Pham


Stage 6: Assessing the Structural Model Validity

Comparison to the Measurement Model


Chi-square difference statistic (𝚫𝒙𝟐 )

The 𝑥 2 value from some baseline model (B) is subtracted from the 𝑥 2 value of a lesser constrained,
alternative nested model (A)

Dr. Binh Pham


Stage 6: Assessing the Structural Model Validity

Testing Structural Relationship


1. Statistically significant and in the predicted direction. That is, they are greater than zero for a
positive relation ship and less than zero for a negative relationship
2. Non-trivial. Effect sizes should be checked for practical significance using the completely standardized
loading estimates. Coefficients can be statistically significant but practically meaningless, particularly as
samples become large.

Dr. Binh Pham


PLS-SEM DECISION PROCESS

Stage 1 Defining research objectives and selecting constructs


Stage 2 Designing a study to produce empirical results
Stage 3 Specifying the measurement and structural models
Stage 4 Assessing the measurement model validity
Stage 5 Assessing the structural model
Stage 6 Advanced analyses with PLS-SEM

Dr. Binh Pham


Stage 1:
Defining research objectives and selecting constructs

Structural Model Assessment

PLS-SEM evaluates structural models differently from CB-SEM. It derives solutions from total variance rather
than covariances, eliminating the need for traditional goodness-of-fit (GOF) metrics. Instead, the evaluation
focuses on:
• Reliability, convergent validity, and discriminant validity of measurement models.
• Predictive power of the structural model using metrics such as R² (explained variance), f² (effect size),
and Q² (predictive relevance).

Dr. Binh Pham


Stage 1:
Defining research objectives and selecting constructs

Versatility in Research

PLS-SEM offers additional flexibility by supporting both exploratory and confirmatory research:
• Exploratory modelling
o Allows to explore relationships without a fully developed theoretical foundation
o Adjustments in measurement and structural models based on data insights
o Hypotheses can also be generated and tested even if they were not pre-specified.
• Addressing undefined research problems: Excels in situations where research problems lack clear
definitions, letting data guide the discovery of relationships.
• Hypothesis generation and testing: Supports hypothesis development from qualitative research and
tests these hypotheses in different contexts, such as cross-country comparisons.
• Flexible research questions: Accommodates various questions (e.g., "what," "why," "how") to uncover
insights beyond predefined hypotheses.

Dr. Binh Pham


Stage 1:
Defining research objectives and selecting constructs

Focus on Predictive Modeling

One of PLS-SEM's strengths is predictive modeling, which combines statistical and explanatory analysis. This
involves:
• Testing variable relationships and their statistical significance
• Assessing the influence of antecedents
• Explaining outcomes through metrics like explained variance and effect sizes
• Particularly beneficial in business and social sciences, where research often relies on survey or
observational data. It enables researchers to test hypotheses, identify influential variables, and predict
outcomes effectively.

Dr. Binh Pham


Stage 2:
Designing a Study to Produce Empirical Results

Metric versus Non-Metric Data and Multivariate Normality

• Flexibility in Data Utilization: PLS-SEM accommodates both metric (ratio/interval) and nonmetric
(nominal/ordinal) data types, enhancing analytical versatility.
• Non-Dependency on Multivariate Normality: PLS-SEM is effective with non-normal data distributions,
common in social sciences research contexts.
• Robustness Considerations: CB-SEM demonstrates robustness to non-normality; thus, reliance on PLS-
SEM requires justification beyond data distribution characteristics.
• Non-Parametric Methodology: PLS-SEM's non-parametric nature facilitates its application across
diverse data structures and distributions.

Dr. Binh Pham


Stage 2:
Designing a Study to Produce Empirical Results

Metric versus Non-Metric Data and Multivariate Normality

• Flexibility in Data Utilization: PLS-SEM accommodates both metric (ratio/interval) and nonmetric
(nominal/ordinal) data types, enhancing analytical versatility.
• Non-Dependency on Multivariate Normality: PLS-SEM is effective with non-normal data distributions,
common in social sciences research contexts.
• Robustness Considerations: CB-SEM demonstrates robustness to non-normality; thus, reliance on PLS-
SEM requires justification beyond data distribution characteristics.
• Non-Parametric Methodology: PLS-SEM's non-parametric nature facilitates its application across
diverse data structures and distributions.

Dr. Binh Pham


Stage 2:
Designing a Study to Produce Empirical Results

Missing Data

Missing data can complicate estimation and interpretation in SEM models, including PLS-SEM and CB-SEM.

Key Questions to Address:


Is the missing data extensive and non-random enough to cause problems?
What is the best approach to address the missing data?

• Common Remedies: Missing data solutions may reduce sample size or require complex imputation
techniques.
• Planning Ahead: Researchers should aim for larger sample sizes to offset potential missing data issues.

Dr. Binh Pham


Stage 2:
Designing a Study to Produce Empirical Results

Statistical Power

• Definition: the probability of making the correct decision if the alternative hypothesis is true and the null
hypothesis should be rejected.

• PLS-SEM vs. CB-SEM: PLS-SEM has greater statistical power than CB-SEM, increasing the likelihood of
detecting significant relationships.

• Advantages for Exploratory Research: PLS-SEM's higher statistical power makes it ideal for exploratory
research with underdeveloped theories and smaller sample sizes.

• Model Development Process: In PLS-SEM, structural relationships evolve as empirical data interacts
with the model, leading to continuous model refinement during estimation.

Dr. Binh Pham


Stage 2:
Designing a Study to Produce Empirical Results

Model Complexity and Model Size

PLS-SEM can be applied effectively with both small and large sample sizes, distinguishing it from other SEM
methods.

Factors Affecting Sample Size: Three key factors influence the required sample size:
1. Number of Constructs: PLS-SEM typically uses more constructs per model (about 8) compared to
CB-SEM (about 5).
2. Number of Indicators: PLS-SEM often has a higher number of indicators per construct compared to
CB-SEM.
3. Observations per Parameter: The rule of thumb for sample size is 10 times the number of arrows
pointing at a construct, whether formative or structural.

Dr. Binh Pham


Stage 3:
Specifying the Measurement and Structural Models

• In PLS-SEM, the structural model is specified at the same time as the measurement models
• To specify the PLS path model researchers must rely on their knowledge of both measurement theory
and structural theory.

Dr. Binh Pham


Stage 3:
Specifying the Measurement and Structural Models

Measurement Theory and Models


• The relationship between a latent variable and its observed indicators:

X: the observed (measured) indicator variable


Y: the latent construct
Loading (l): a standardized regression coefficient representing the strength of the relationship between X
and Y
e: random measurement error

Dr. Binh Pham


Stage 3:
Specifying the Measurement and Structural Models

Measurement Theory and Models


Reflective measurement model
• Have direct relationships from the construct to the indicators and treat the indicators as error-prone representations of
the construct
• The indicators should be a representative sample of all items of the construct’s conceptual domain, and are expected
to be highly correlated when representing the same domain
• Endogenous latent variables always have error terms associated with them
Formative measurement model
• A linear combination of a set of indicators that form the construct (i.e., the relationship/arrow is from the indicators to
the construct)
• Two types of indicators: causal indicators and composite indicators
• Causal indicators: construct have an error term that implies the construct has not been perfectly measured by its
indicators
• Composite indicators: the error term of a construct is set to zero
o Can be used for dimension reduction

Dr. Binh Pham


Stage 3:
Specifying the Measurement and Structural Models

Structural Theory and Models


• Specifies the latent constructs in the theoretical SEM model and their relationships
• The positions of the constructs are identified by theory and knowledge of the researcher
• Latent constructs on the left side of the path model are independent variables (antecedents)
• Latent variables on the right side are dependent variables (outcome variables)
• Path coefficients represent the strength of the relationships between the latent variables/ constructs

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Assessing Reflective Measurement Model


Indicator Loadings
• With same interpretation as CB-SEM, loadings should be a minimum of 0.708
• Since having typically somewhat higher indicator loadings than CB-SEM, researchers often interpret loadings
more conservatively
Construct Reliability
• 0.60 - 0.70: “acceptable in exploratory research”
• 0.70 - 0.95 : “satisfactory to good”
• 0.95 and above: indicating redundancy and scale should be redesigned
Convergent Validity - as measured by average variance extracted (AVE), should be at least .50.

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Assessing Reflective Measurement Model


Discriminant Validity
• The recommended method is the Henseler et al. heterotrait-monotrait ratio (HTMT) of correlations
• Defined as the mean value of the indicator correlations across constructs (i.e., the heterotrait-heteromethod
correlations) relative to the geometric mean of the average correlations of indicators measuring the same
construct
• Guideline:
o .90 for conceptually similar construct
o .85 for conceptually distinct constructs
• The HTMT value should be examined based on confidence intervals to determine if it is significantly different
from one (1.0).

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Assessing Formative Measurement Model


Convergent Validity
• Redundancy analysis assesses the degree to which the indicators relate to an additional reflectively measured
construct (single or multi-item) that represents the same concept
• Path coefficients should be minimum of 0.708
Indicator Multicollinearity
• VIF values of 3 are likely to indicate a problem, and 5 and above are definite indicator of a problem
• Bivariate correlation of 0.50 or indicates collinearity is likely to be a problem

Dr. Binh Pham


Stage 4: Assessing Measurement Model Validity

Assessing Formative Measurement Model


Statistical Significance of Indicator Weights
• Bootstrapping must be applied to test statistical significance
• Bootstrapping: resampling technique that creates subsamples (typically 1,000 or more) of the original sample,
randomly with replacement, and re-estimates the model for each new subsample
• CI includes zero: the weight is not statistically significant and the indicator should be considered for removal
after the contribution of the indicator is assessed
Contribution of Indicator
• The variance it shares with its construct, without considering any other indicators
• The indicator is retained if the bivariate correlation (loading) is 0.50 or higher
• If the indicator weight is not significant and the bivariate correlation (loading) is below 0.50, it should be
removed from the measurement model

Dr. Binh Pham


Stage 5: Assessing the Structural Model

Collinearity among Predictor Constructs


• Variance inflation factor (VIF): A statistic used to evaluate the severity of collinearity among the indicators in a
formative measurement model, or between the constructs in a structural model
o The higher the VIF, the greater the level of collinearity
o Values above 5: a definite indication of collinearity
• Alternative approach: executing the bivariate correlations of the predictor construct scores
o Correlations of 0.50 or higher: possibility of collinearity
Coefficient of Determination (𝑹𝟐 )
• Guideline: 0.75, 0.50, and 0.25 can be considered substantial, moderate, and weak, respectively
• However, in some research contexts 𝑅 2 values of 0.10, and even lower, are considered satisfactory
Effect Size (𝒇𝟐 )
• Represents the change in the R2 value when a specified exogenous construct is omitted from the model
• Calculated by the difference in 𝑅 2 when the predictor is included in the model and when the predictor is omitted from
the model
• Guideline: values of less than 0.02, 0.02, 0.15, and 0.35, and represent no effect, small, medium, and large effects,
respectively
Dr. Binh Pham
Stage 5: Assessing the Structural Model

Blindfolding (𝑸𝟐 )
• Assesses the model’s predictive power, also referred to as predictive relevance
• To obtain 𝑄 2 :
o Raw data values are omitted sequentially, the values are then imputed, and the model parameters are estimated
o The parameter estimates are then used to predict the omitted raw data values
o The process is repeated until every data point has been omitted and the model re-estimated
• Larger 𝑄 2 indicates higher predictive accuracy and values larger than 0 indicate acceptable predictive accuracy
Size and Significance of Path Coefficients
• The coefficients should be meaningful in size and statistically significant
• The acceptable size depends on the complexity of the path model and the context of the research
• The statistical significance is obtained using the bootstrapping method

Dr. Binh Pham


Stage 6: Advanced Analyses Using PLS-SEM

Multi-group Analysis of Observed Heterogeneity


• Multi-group analysis (PLS-MGA) is a process for examining separate groups of respondents to determine if
there are differences in the model parameters between the groups
• Most often applied in situations where scholars include assessments of observed heterogeneity in their
research design
• Example:
o Determining demographic differences between groups based on age, gender, income, etc.
o B-to-B researchers often collect data on firm size, market share, industry type, and so forth to be able to
make group comparisons
• Can also be applied to examine potential differences to detect unobserved heterogeneity - a situation in which
there are unobservable characteristics that cause differences in subgroups

Dr. Binh Pham


Stage 6: Advanced Analyses Using PLS-SEM

Confirmatory Tetrad Analysis


• A method of empirically testing and evaluating the cause-effect relationships for latent variables as well as the
specification of indicators in measurement models
• Provides empirical evidence that can be used to minimize the likelihood of misspecification of formative and reflective
indicators
Moderation
• A situation where the relationship between two variables/constructs is influenced by a third variable
• Is often hypothesized as a priori and subsequently get tested. It can also be tested on a post hoc basis
• Frequently hypothesized for nominal or categorical variables
• Both PLS-SEM and CB-SEM are suitable for examining moderation involving categorical variables. With continuous
variables, they are rather difficult to evaluate with CB-SEM

Dr. Binh Pham


Stage 6: Advanced Analyses Using PLS-SEM

Higher-order Measurement Model


• Relationships between constructs that simultaneously measure a concept at different levels of abstraction
• Comprised of:
o Single higher-order construct (HOCs) that represents the overall concept,
o Two or more lower-order constructs (LOCs) that measure more concrete facets of the HOC
• Higher-order measurement models when using PLS-SEM is very flexible and can be executed with two or more
layers of LOCs
• CB-SEM HOCs require a minimum of three lower-order constructs to achieve identification

Dr. Binh Pham


Thank You

You might also like