Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views49 pages

Factor Analysis

The document discusses factor analytic techniques, primarily focusing on Principal Components Analysis (PCA) for reducing variables and detecting relationships among them. It explains concepts such as eigenvalues, communalities, and the KMO test for sampling adequacy, highlighting the importance of these factors in determining the appropriateness of factor analysis. Additionally, it provides insights into the extraction and rotation of factors to achieve a clear structure in data analysis.

Uploaded by

sharmabhup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views49 pages

Factor Analysis

The document discusses factor analytic techniques, primarily focusing on Principal Components Analysis (PCA) for reducing variables and detecting relationships among them. It explains concepts such as eigenvalues, communalities, and the KMO test for sampling adequacy, highlighting the importance of these factors in determining the appropriateness of factor analysis. Additionally, it provides insights into the extraction and rotation of factors to achieve a clear structure in data analysis.

Uploaded by

sharmabhup
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

General Purpose

The main applications of factor analytic


techniques are:
1. To reduce the number of variables
and
2. To detect structure in the
relationships between variables, that
is to classify variables.
 Basic assumption Variables that
significantly correlate with each
other do so because they are
measuring the same "thing".
 The problem What is the "thing" that
correlated variables are measuring in
common
Principal Components Analysis.

 Combining two correlated variables into one


factor, illustrates the basic idea of factor
analysis, or of principal components analysis
to be precise
 If we extend the two-variable example to
multiple variables, then the computations
become more involved, but the basic
principle of expressing two or more
variables by a single factor remains the
same.
Extracting Principal Components

The extraction of principal


components amounts to a variance
maximizing (varimax) rotation of the
original variable space.
In a scatterplot we can think of the
regression line as the original X axis,
rotated so that it approximates the
regression line.
This type of rotation is called variance
maximizing because the criterion for
(goal of) the rotation is to maximize the
variance (variability) of the "new"
variable (factor), while minimizing the
variance around the new variable
Generalizing to the Case of
Multiple Variables
When there are more than two
variables, we can think of them as
defining a "space," just as two
variables defined a plane. Thus, when
we have three variables, we could plot
a three- dimensional scatter plot, and,
again we could fit a plane through the
data.
With more than three variables it
becomes impossible to illustrate the
points in a scatter plot, however, the
logic of rotating the axes so as to
maximize the variance of the new
factor remains the same.
Eigenvalues

 The Eigen value explains the total


variance explained by each factor.
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings
Component Total % of Variance Cumulative % Total % of Variance Cumulative %
1 2.711 27.112 27.112 2.711 27.112 27.112
2 1.626 16.259 43.371 1.626 16.259 43.371
3 1.514 15.143 58.514 1.514 15.143 58.514
4 1.059 10.594 69.108 1.059 10.594 69.108
5 0.910 9.098 78.206
6 0.788 7.883 86.090
7 0.640 6.405 92.494
8 0.465 4.653 97.148
9 0.172 1.720 98.868
10 0.113 1.132 100.000
Extraction Method: Principal Component Analysis.
 In the second column (Eigenvalue), we find the
variance on the new factors that were successively
extracted.
 In the third column, these values are expressed as
a percent of the total variance (in this example,
10). As we can see, factor 1 accounts for 33
percent of the variance, factor 2 for 40 percent,
and so on.
 11 As expected, the sum of the eigenvalues is equal
to the number of variables. The third column
contains the cumulative variance extracted. The
variances extracted by the factors are called the
eigenvalues.
Eigenvalues and the Number-
of-Factors Problem

The Kaiser criterion. First, we can retain


only factors with eigenvalues greater than
1. In essence this is like saying that, unless
a factor extracts at least as much as the
equivalent of one original variable, we drop
it. This criterion was proposed by Kaiser
(1960), and is probably the one most widely
used. In our example above, using this
criterion, we would retain 4 factors
(principal components).
The scree test
Scree Plot
3.0

2.5

2.0

1.5

1.0
Eigenvalue

.5

0.0
1 2 3 4 5 6 7 8 9 10

Component Number
Communalities.

 we should not expect that the


factors will extract all variance
from our items; rather, only that
proportion that is due to the
common factors and shared by
several items
 In the language of factor analysis,
the proportion of variance of a
particular item that is due to
common factors (shared with
other items) is called
communality.
 The proportion of variance that is
unique to each item is then the
respective item's total variance minus
the communality
 Communality =sum of squares of
factor loadings
 If communality is high means large
part of the variance is explained by
factor analysis and vice-versa.
Factor matrix

 Contains the factor loadings of all the


variables on all the factor extracted
Factor Loadings

The factor loadings are the correlations


between the variables and the two
factors (or "new" variables )
Rotating the Factor Structure

The goal of rotation is to obtain a clear


pattern of loadings, that is, factors
that are somehow clearly marked by
high loadings for some variables and
low loadings for others. This general
pattern is also sometimes referred to
as simple structure
Varimax rotation

 Goal –to maximize the variance of new


variables and to minimize the rotation
around the factor
 Variables near the origin have small
loadings
 Variables at the end of the axis have
high loadings
UNROTATED FACTOR
ROTATION
Ways to Determine the Factorability
of an Intercorrelation Matrix

Two Tests
 Bartlett's Test of Sphericity
 Kaiser-Meyer-Olkin Measure of
Sampling Adequacy (KMO)
 Consider the intercorrelation matrix
below, which is called an identity matrix.

X1 X2 X3 X4 X5

X1 1.00 0.00 0.00 0.00 0.00

X2 1.00 0.00 0.00 0.00

X3 1.00 0.00 0.00

X4 1.00 0.00

X5 1.00
 The variables are totally
noncollinear. If this matrix was
factor analyzed …
 It would extract as many factors as
variables, since each variable
would be its own factor.
 It is totally non-factorable
Barlett,s test of sphericity

Null hypothesis : Variables are


uncorrelated
Correlation matrix is an identity matrix
If the test is not significant means we
accept null hypothesis that variables
are uncorrelated
Then the appropriateness of the test
can be questioned.
K.M.O test

 If two variables share a common


factor with other variables, their partial
correlation (aij) will be small,
indicating the unique variance they
share.
K.M.O test

 Used to measure sampling adequacy.


 This index is used to measure the
appropriateness of the test .
 High values (.5 – 1) means factor
analysis is adequate.
Interpretation of the KMO as
characterized by Kaiser, Meyer, and
Olkin …
KMO Value Degree of Common Variance

0.90 to 1.00 Marvelous

0.80 to 0.89 Meritorious

0.70 to 0.79 Middling

0.60 to 0.69 Mediocre

0.50 to 0.59 Miserable


0.00 to 0.49 Don't Factor
communalities

 Communalities
 Initial Extraction
 Q1 1.000 .695
 Q2 1.000 .551
 Q3 1.000 .553
 Q4 1.000 .570
 Q5 1.000 .835
 Q6 1.000 .869
 Q7 1.000 .759
 Q8 1.000 .641
 Q9 1.000 .656
 Q10 1.000 .783
 Extraction Method: Principal Component Analysis.
Communality is the amount of variance
each variable shares with other
variables .
Is the amount of variance explained in
each variable explained by factors

h2 =sum of squares of factor loadings


We can see from the table that all the
variables have communality higher
than 5. This implies all the variables
have to be included in the factor
analysis model
Eigen value
Total Variance Explained
Initial Eigenvalues Extraction Sums of Squared Loadings
Component Total % of Variance Cumulative %Total % of Variance Cumulative %
1 2.711 27.112 27.112 2.711 27.112 27.112
2 1.626 16.259 43.371 1.626 16.259 43.371
3 1.514 15.143 58.514 1.514 15.143 58.514
4 1.059 10.594 69.108 1.059 10.594 69.108
5 0.910 9.098 78.206
6 0.788 7.883 86.090
7 0.640 6.405 92.494
8 0.465 4.653 97.148
9 0.172 1.720 98.868
10 0.113 1.132 100.000
Extraction Method: Principal Component Analysis.
Eigen Value is the amount of variance
explained by each factor.
Here the eigen of first four factors is
more than one , therefore by kaiser
criterion these should be included in
the model ,rest all the factors are
dropped.
 Also the first four factors explain
almost 69% of the variance ,by this
also first four factors explain maximum
variance and qualify to be included in
the model.
Scree Plot

Scree Plot
3.0

2.5

2.0

1.5

1.0
Eigenvalue

.5

0.0
1 2 3 4 5 6 7 8 9 10

Component Number
 From scree plot also we see that after
four factors the graph smoothens
,therefore four factors will be included
in the model.
Component Matrix
Component
1 2 3 4
Q1 -0.406 -0.052 0.588 -0.426
Q2 -0.619 0.144 -0.245 0.296
Q3 -0.246 -0.689 -0.129 -0.023
Q4 0.240 0.585 0.290 -0.292
Q5 0.751 -0.343 0.050 -0.388
Q6 0.874 -0.099 -0.004 0.307
Q7 0.807 0.312 0.027 0.102
Q8 -0.186 0.743 -0.234 0.010
Q9 0.173 -0.046 -0.788 -0.053
Q10 0.040 -0.081 0.573 0.668
Extraction Method: Principal Component Analysis.

a 4 components extracted.
How much variance in the
first variable is explained
by the factors

Here h2 = (-.406 )2 + (-.052 )2


+(.588 )2 + (-.426 )2
=0.695
= communality for the first
variable
Rotated Component Matrix
Component
1 2 3 4
Q1 -0.025 -0.007 -0.831 0.053
Q2 -0.734 -0.095 0.054 -0.009
Q3 -0.016 -0.733 -0.083 -0.088
Q4 0.197 0.695 -0.215 -0.029
Q5 0.891 -0.082 0.074 -0.167
Q6 0.658 0.094 0.579 0.302
Q7 0.553 0.493 0.435 0.146
Q8 -0.452 0.616 0.121 -0.206
Q9 -0.021 -0.123 0.578 -0.553
Q10 -0.034 -0.047 0.008 0.883
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a Rotation converged in 7 iterations.
Rotated Component Matrix
Component
1 2 3 4
Q1 -0.831
Q2 -0.734
Q3 - -0.733
Q4 0.695
Q5 0.891
Q6 0.658
Q7 0.553

Q8 0.616

Q9 0.578
Q10 0.883
Extraction Method: Principal Component Analysis.
Rotation Method: Varimax with Kaiser Normalization.
a Rotation converged in 7 iterations.
KMO and Bartlett's Test
Kaiser-Meyer-Olkin Measure of Sampling
Adequacy. .395

Bartlett's Test of Sphericity Approx.


Chi-Square 52.293
df 45
Sig. .212
Interpretation

The degree of common variance


among the ten variables is
“miserable”
 The value of KMO test for sampling
adequacy is 0.395 <.5
 The degree of common variance among
the ten variables is “miserable”
 this means factor analysis might be
inadequate.
Bartlett's Test of Sphericity is also not
significant .05
Therefore null hypothesis that variables
are uncorrelated is questioned
REKHA SOAPS

Unstandar Standardi
dized zed
Coefficient Coefficient
s s t Sig.

B Std. Error Beta

(Constant) 11.83925 6.287805 1.882891 0.200417

market potential in territory in lakhs -0.02442 0.188881 -0.01324 -0.12928 0.908965

n.oof shops in hundreds 0.407671 0.384043 0.113058 1.061527 0.399686

n.o of dealers in hundreds 0.430798 0.161199 0.345857 2.672465 0.116127


n.o of other popular brands -1.42585 0.346125 -0.30155 -4.11946 0.054183

population in thousand 0.142109 0.067043 0.270542 2.119687 0.168147


Dependent Variable: sales in lakhs
REKHA SOAPS

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate

1 0.990616 0.981319707 0.977427979 2.64583444


ANOVA

Model Sum of Squares df Mean Square F Sig.

1 Regression 8825.989 5 1765.198 252.1553 6.35E-20

Residual 168.0106 24 7.00044

Total 8994 29
Coeff
icient
s

Unstandardized Standardized
Coefficients Coefficients t Sig.

Mode Std.
l B Error Beta

0.3120
1 (Constant) 2.027268 6.49585 87 0.75767

market potential in territory in 0.16910 1.5522


lakhs 0.2625 5 0.129185 89 0.133681

0.30816 0.9344
n.oof shops in hundreds 0.287961 9 0.08312 26 0.359391

0.15766 2.7155
n.o of dealers in hundreds 0.428153 7 0.337813 47 0.012068

-
0.29299 2.7561
n.o of other popular brands -0.80755 8 -0.20237 7 0.01099

0.06776 2.0552
population in thousand 0.139267 2 0.274379 27 0.050895

Dependent Variable: sales in


a lakhs

You might also like