Multivariate Methods
Multivariate Methods
Background Information
Cluster analysis is useful in many different disciplines such as business, biology, psychology,
and sociology. Also referred to as Q-analysis, classification analysis, and numerical
taxonomy, the name varies from discipline to discipline.
There is, however, one commonality: classification by natural relationship, which suggests
that the value of cluster analysis lies in preclassifying data. For example, you might classify
large amounts of otherwise meaningless data into manageable groups; you might reduce the
data into specific smaller subgroups; or you might develop hypotheses about the nature of the
data. Whatever the situation, using cluster analysis becomes increasingly complex when you
add more variables or include mixed datasets.
Cluster analysis involves using techniques in three stages: partition, interpret, and profile.
During the partition stage decisions are made about how to measure the data, which
algorithms are the best suited for classifying the data, and determining the number of
clusters that will be formed.
The most commonly used algorithms are hierarchical and nonhierarchical. Hierarchical
methods are used to construct either agglomerative or divisive tree-like structures. In the
agglomerative method, each observation begins in its own cluster and in subsequent steps
combines with new aggregate clusters, which reduces the number of clusters in each step.
When the clustering process goes in the opposite direction, it is known as a divisive or K-
means method. Nonhierarchical algorithms do not involve tree-like structures.
Determining how to choose the number of clusters is not a definitive process. Generally the
distance between clusters at sequential steps provides guidelines. Usually it is best to first
try several solutions for different clusters then make a final decision from among the
solutions.
In the interpretion stage, the statements used to develop the clusters are examined and
assigned a name or label that accurately describes the characteristics of the cluster.
The profile stage describes the characteristics of each cluster, which explains how each
cluster may differ on relevant dimensions. This stage usually involves using discriminant
analysis or other appropriate statistics, then uses the clusters as they are labeled in stage
two. The analysis continues by using data that have not been previously identified to form
each cluster's characteristics. In other words, this stage focuses on identifying the clusters
then describing their characteristics instead of focusing on the composition of the clusters.
Nearest Neighbor
This method finds the two observations that have the shortest distance and places them in
the first cluster. Then it finds the next shortest distance and either joins a third observation
to the first two to form a cluster or forms a new two-observation cluster. This process
continues until all the observations are in one cluster. The Nearest Neighbor method is also
known as the Single Linkage method.
Furthest Neighbor
This method uses the maximum distance between any two observations in a cluster. All the
observations in a cluster are linked to each other at some maximum distance or by some
minimum similarity. This method is also known as Complete Linkage.
Centroid
In this method the distance between the means of two clusters is used as the measurement.
Each time observations are grouped, new centroids are formed. The cluster centroids move
each time a new observation or group of observations is added to an existing cluster.
Centroid methods require that you use metric data; other methods do not.
Median
This method uses the median distance from observations in one cluster to observations in
another as the measurement. This approach tends to combine clusters that have small
variances and may produce clusters that have the same variance.
Group Average
In this method the distance between clusters is calculated as the average distance between
all the observations in one cluster and the average distance between all the observations in
another cluster.
Ward's
In Ward's method the distance between two clusters is the sum of squares between two
clusters summed over all the variables. This method combines clusters that have a small
number of observations, and tends to produce clusters that have approximately the same
number of observations.
K-means
K-means clustering is a nonhierarchical method. Each cluster begins with a specified
number of groups, each of which has a single random point. A sequence of points is sampled,
and each point is added, in turn, to the group whose mean it is closest to. The group mean is
then adjusted.
Tabular Options
Analysis Summary
The Analysis Summary option creates a summary of the analysis (see Figure 1-2). The
summary displays the names of the variables, the number of complete cases, and the names
of the clustering method and distance metric you are using. It then displays a summary of
the clusters, the centroids (means) for the cluster, and for each of the variables.
Use the Cluster Analysis Options dialog box to choose the method that will be used to
combine the observations or variables into clusters, to enter the number of clusters that will
be created, and to choose the type of measurement method that will be used to calculate the
distance between clusters.
You can also use the Seeds... command to access the Seed Options dialog box, which you use
to enter the row numbers of the observations that will be used for each corresponding cluster.
This command is available only when you choose the K-means option.
Membership Table
The Membership Table option creates a report that lists each observation and the cluster in
which it is placed (see Figure 1-3).
Use the Membership Table Options dialog box to indicate if the clusters should be sorted and
displayed together. For example, if you are using two clusters, the table with cluster 1's will
display first followed by cluster 2's.
Icicle Plot
The Icicle Plot option creates an icicle plot that shows the number of clusters horizontally
across the top of the plot and the observations vertically down the side (see Figure 1-4).
Use the Icicle Plot Options dialog box to change the width of the plot.
Agglomeration Schedule
The Agglomeration Schedule option creates a table that shows the clusters as they are
combined at each step (see Figure 1-5).
Two-Dimensional Scatterplot
The Two-dimensional Scatterplot option creates a two-dimensional scatterplot, which plots
clustered observations versus two variables (see Figure 1-7). A different point symbol is used
for each cluster. As an option, you can circle clusters to see them more clearly.
Use the Two-Dimensional Scatterplot Options dialog box to choose the names of the variables
that will be used and to indicate if you want the clusters circled.
You can also use the Target Variables text boxes to enter the names of the variables in which
you want to save the values generated during the analysis. You can enter new names or
accept the defaults.
Note: To access the Save Results Options dialog box, click the Save Results button on the
Analysis toolbar (the fourth button from the left).
References
Anderson, T. W. 1984. An Introduction to Multivariate Statistical Analysis, second edition.
New York: Wiley.
Bolch, B. W. and Huang, C. J. 1974. Multivariate Statistical Methods for Business and
Economics. New Jersey: Prentice-Hall.
Chambers, J. M., Cleveland, W. S., Kleiner, B., and Tukey, P. A. 1983. Graphical Methods
for Data Analysis. Belmont, CA: Wadsworth International Group.
Hair, J., Anderson, R., and Tatham, R. 1992. Multivariate Data Analysis, third edition.
Englewood Cliffs, NJ: Prentice-Hall.
Milligan, G. W. 1980. "An Examination of the Effect of Six Types of Error Perturbation on
Fifteen Clustering Algorithms," Psychometrika 45:325-342.
Morrison, D. F. 1990. Multivariate Statistical Methods, third edition. New York: McGraw-
Hill Publising Company.
Background Information
Factor analysis is a technique useful for reducing information in a large number of variables
into a smaller set, while losing only a minimal amount of information. Variables in a factor
analysis model represent a linear function containing a small number of common factor
variables and a single specific variable. The common factors create covariances among the
responses, while the specific variable adds only to the variances of specific responses.
• identify a set of dimensions you cannot easily observe in a large set of variables
• devise a means for combining or condensing large numbers of observations into distinctly
different groups within a larger population
There are a number of general steps you must consider when you are deciding which factor
analysis technique to use:
• your problem
• the correlation matrix
• the type of model
• how you will extract the factors
• the number of factors you will extract
• whether or not you will rotate the factors
• how you will rotate the factors
• how you will interpret the rotated factors and their scores.
Generally, you would not use factor analysis if the data contain fewer than 50 observations
but preferably 100 or more.
Although there are several types of general factor models, the two that analysts use most are
principal component analysis and common factor analysis.
Besides determining the type of factor analysis you want to use, you must also determine if
you want to extract the factors orthogonally. Mathematically, orthogonal solutions are
simpler; they extract factors so the axes remain at 90 degrees, and each factor remains
independent of the others.
Factor rotation is another important concept in factor analysis. Rotation means that the
factors are turned until they reach another position. The primary reason for rotating factors
is to attain a simpler and more meaningful solution. Rotation is generally desirable because
it simplifies the rows and/or columns of the matrix.
Quartimax
This approach simplifies the rows of the factor matrix.
Varimax
This approach simplifies the columns of the factor matrix.
Equimax
This approach simplifies the rows or the columns by combining aspects of each of the above
two approaches.
STATGRAPHICS Plus also contains two methods you can use to determine the number of
factors to extract from an analysis: minimum eigenvalues or number of factors (see the
Tabular Options section for descriptions of the two methods).
To access the analysis, choose SPECIAL... MULTIVARIATE METHODS... FACTOR ANALYSIS... from
the Menu bar to display the Factor Analysis dialog box shown in Figure 2-1.
Tabular Options
Analysis Summary
The Analysis Summary option creates a summary of the analysis that displays the name of
each of the variables, the type of data you entered, the number of complete cases, the name of
the missing value treatment, whether or not the data are standardized, and the type of
factoring used (see Figure 2-2).
The summary then displays the factor numbers, eigenvalues, percentage of variance, and
cumulative percentage for each factor. The end of the summary displays the name of each
variable and the initial communalities.
Communality is the amount of variance each of the original variables shares with all the
other variables in the analysis. For the Principal Components method, the communalities
are all ones. For the Classical method, the communalities are the multiple correlations for
each variable when regressed against the other variables.
Use the Factor Analysis Options dialog box to indicate how missing values will be treated; to
indicate if the variables will be standardized; to choose the technique that will be used to
estimate the values for the factors and the type of rotation that will be used to rotate the
factors; to choose how the factors will be extracted from the analysis; and to enter values for
the minimum eigenvalue and the number of factors that will be included.
You can also use the Estimation... command to display the Estimation Options dialog box
that you use to limit and stop the factor-rotation process, as well as the Communalities...
command to display the Communalities Options dialog box that you use to choose or enter
the name of the variable that contains the communalities that will be used.
Extraction Statistics
The Extraction Statistics option creates a factor-loading matrix that shows the loadings for
each of the factors included in the analysis before the factor is rotated (see Figure 2-3).
Rotation Statistics
The Rotation Statistics option displays the rotated factor-loading matrix (see Figure 2-4).
The table also displays the name of the rotation method and the extimated communality for
each of the variables.
Factor Scores
The Factor Scores option creates a table of the factor scores for each row of the data file (see
Figure 2-5).
Graphical Options
Scree Plot
The Scree Plot option creates a plot of the eigenvalues for each of the factors in the analysis
(see Figure 2-6). The eigenvalues are proportional to the percent of variability in the data
that can be attributed to the factors. If you use the Minimum Eigenvalue option, a horizontal
line is plotted at that value.
Use the Scree Plot Options dialog box to indicate if eigenvalues or the percent of variance will
appear as data on the plot.
2D Scatterplot
The 2D (Two-Dimensional) Scatterplot option creates a two-dimensional scatterplot of the
values for two factors (see Figure 2-7). One point appears for each row in the data file. The
plot is helpful in interpreting the factors and in understanding how the factors compare.
Use the 2D Scatterplot Options dialog box to enter the number of the factors you want plotted
on the X- and Y-axes.
3D Scatterplot
The 3D (Three-Dimensional) Scatterplot option creates a three-dimensional scatterplot of the
values for three factors (see Figure 2-8). One point appears for each row in the data file. The
plot is helpful in interpreting the factors and in understanding how the factors compare.
Use the 3D Scatterplot Options dialog box to enter the numbers of the factors you want to
appear on the X-, Y-, and Z-axes.
2D Factor Plot
The 2D (Two-Dimensional) Factor Plot option creates a two-dimensional plot of the weight
loadings for each chosen factor (see Figure 2-9). Reference lines are drawn at 0.0 in each
dimension. A weight close to 0.0 indicates that the variable contributes little to the factor.
3D Factor Plot
The 3D (Three-Dimensional) Factor Plot option creates a three-dimensional plot of the
weight loadings for each chosen variable (see Figure 2-10). One point appears for each
variable. A weight close to 0.0 indicates that the variable contributes little to the factor.
Use the 3D Factor Weights Plot Options dialog box to enter the number of the factor that will
be plotted on the X-, Y-, and Z-axes. Reference lines are drawn at 0.0 in each dimension. A
weight close to 0.0 indicates that the variable contributes little to the factor.
You can also use the Target Variables text boxes to enter the names of the variables in which
you want to save the values generated during the analysis. You can enter new names or
accept the defaults.
Note: To access the Save Results Options dialog box, click the Save Results button on the
Analysis toolbar (the fourth button from the left).
References
Anderson, T. W. 1984. An Introduction to Multivariate Statistical Analysis, second edition.
New York: Wiley.
Hair, J., Anderson, R., and Tatham, R. 1992. Multivariate Data Analysis, third edition.
Englewood Cliffs, New Jersey: Prentice-Hall.
Morrison, D. F. 1990. Multivariate Statistical Methods, third edition. New York: McGraw-
Hill.
Background Information
Principal components analysis differs from factor analysis in that you use it whenever you
want to use uncorrelated linear combinations of variables. That is, principal components
analysis is a factor-analysis technique that reduces the dimensions of a set of variables by
reconstructing them into uncorrelated combinations.
The analysis combines the variables that account for the largest amount of variance to form
the first principal component. The second principal component accounts for the next largest
amount of variance, and so on, until the total sample variance is combined into component
groups. Each successive component explains progressively smaller portions of the variance
in the total sample. All of the components are uncorrelated with each other.
Often, a few components will account for 75 to 90 percent of the variance in an analysis.
These components are then the ones you use to plot the data. Other uses for principal
components analysis include various forms of regression analysis and classification and
discrimination problems.
Tabular Options
Analysis Summary
The Analysis Summary option creates a summary of the analysis that first displays the
names of the chosen variables, the type of data entered, the number of complete cases, the
type of treatment used for missing values, indicates if the data were standardized, and
displays the number of components extracted from the analysis (see Figure 3-2).
Use the Principal Components Options dialog box to choose the missing value treatment, to
indicate if the data will be standardized, to indicate the method that will be used to extract
components from the analysis, and to enter values for the minimum eigenvalue and number
of components.
Component Weights
The Component Weights option creates the values that are used in the equations for the
principal components (see Figure 3-3). The values for the variables in the equation are
standardized by subtracting the means (0.573943*logheight + 0.582224*loglngth + 0.575852
*logwidth), and dividing by the standard deviations.
Data Table
The Data Table option creates a table that shows the values for each principal component for
each row of the data table (see Figure 3-4).
Graphical Options
Scree Plot
The Scree Plot option creates a plot of the eigenvalues for each of the principal components
(the total variance contributed by each component) (see Figure 3-5). The eigenvalues are
proportional to the percentage of variability in the data that can be attributed to the
components.
Use the plot to see the gap between the steepest slope of the large components and a
progressive downward path (the scree). The plot identifies at least one and sometimes two or
three of the most significant components.
Use the Scree Plot Options dialog box to indicate whether eigenvalues or the percent of
variance will be displayed on the plot.
2D Scatterplot
The 2D (Two-Dimensional) Scatterplot option creates a plot of the values for two principal
components (see Figure 3-6).
Use the 2D Scatterplot Options dialog box to enter the number of the components that will be
plotted on the X- and Y-axes.
3D Scatterplot
Use the 3D Scatterplot Options dialog box to enter the number of the components that will be
plotted on the X-, Y-, and Z-axes.
2D Component Plot
The 2D (Two-Dimensional) Component Plot option creates a plot that shows the weights for
the chosen principal components (see Figure 3-8). One point appears on the plot for each
variable in the analysis. Reference lines are drawn at 0.0 for each dimension. A weight close
to 0.0 indicates that the variable contributes little to the component.
Use the 2D Component Plot Options dialog box to enter the number of the components that
will be plotted on the X- and Y-axes.
3D Component Plot
The 3D (Three-Dimensional) Component Plot option creates a plot that shows the weights for
the chosen principal components (see Figure 3-9). One point appears on the plot for each
variable in the analysis. Reference lines are drawn at 0.0 for each
dimension. A weight close to 0.0 indicates that the variable contributes little to the
component.
Use the 3D Component Plot Options dialog box to enter the numbers that will be plotted on
the X-, Y-, and Z-axes.
2D Biplot
The 2D Biplot option creates a plot of the chosen principal components (see Figure
3-10). A point appears on the plot for each row in the data file. Reference lines are drawn for
each of the variables that represent the location of the variable in the location in the space of
the component. A weight close to 0.0 indicates that the variable contributes little to that
component.
Use the 2D Biplot Options dialog box to enter the number of the components that will be
plotted on the X- and Y-axes.
Use the 3D Biplot Options dialog box to enter the number of the components that will be
plotted on the X-, Y-, and Z-axes.
You can also use the Target Variables text boxes to enter the names of the variables in which
you want to save the values generated during the analysis. You can enter new names or
accept the defaults.
Note: To access the Save Results Options dialog box, click the Save Results button on the
Analysis toolbar (the fourth button from the left).
References
Anderson, T. W. 1958. An Introduction to Multivariate Statistical Analysis. New York:
Wiley.
Morrison, D. F. 1990. Multivariate Statistical Methods, third edition. New York: Wiley.
Background Information
Complex problems and the results of bad decisions frequently force researchers to look for
more objective ways to predict outcomes. A classic example often cited in statistical
literature is creditworthiness where, based on a collection of variables, potential borrowers
are identified as either good or bad credit risks.
Discriminant analysis is the statistical technique that is most commonly used to solve these
types of problems. Its use is appropriate when you can classify data into two or more groups,
and when you want to find one or more functions of quantitative measurements that can help
you discriminate among the known groups. The objective of the analysis is to provide a
method for predicting which group a new case is most likely to fall into, or to obtain a small
number of useful predictor variables.
Discriminant analysis is capable of handling either two groups or multiple groups (three or
more). When two classifications are involved, it is known as two-group discriminant
analysis. When three or more classifications are identified, it is known as multiple
discriminant analysis. The concept of discriminant analysis involves forming linear
combinations of independent (predictor) variables, which become the basis for group
classifications.
Discriminant analysis is appropriate for testing the hypothesis that the group means of two
or more groups are equal. This is done by multiplying each independent variable by its
corresponding weight and adding the products together, which results in a single composite
discriminant score for each individual in the analysis. Averaging the scores derives a group
centroid. If the analysis involves two groups there are two centroids; in three groups there
are three centroids, and so on. Comparing the centroids shows how far apart the groups are
along the dimension you are testing.
• determining if there are statistically significant differences among two or more groups
• determining which independent variables account for most of the differences of two or
more groups.
The second step involves determining the reason for developing classification matrices,
deciding how well the groups are classified into statistical groups, determining the criterion
against which each individual score is judged, constructing the classification matrices, and
interpreting the discriminant functions to determine the accuracy of their classification.
The last step, interpretation, involves examining the discriminant functions to determine the
importance of each independent variable in discriminating between the groups, then
examining the group means for each important variable to outline the differences in the
groups.
Tabular Options
Analysis Summary
The Analysis Summary option creates a summary of the analysis that shows the name of the
classification variable, the names of the independent variables, the number of complete
cases, and the number of groups in the study (see Figure 4-2). It then displays the results,
eigenvalues, relative percentages, canonical
Use the Discriminant Analysis Options dialog box to choose the way the variables will be
entered into the discriminant model, to enter values for the F-ratio at or above which the
variables will be entered into the model, and to enter the maximum number of steps that will
be performed before the selection process ends.
Classification Functions
The Classification Functions option creates a table that displays Fisher's linear discriminant
function coefficients for each group (see Figure 4-3). The functions are used to classify the
observations into groups. For example, the function for the first level of the variable is:
The function that yields the largest value for an observation represents the predicted group.
Discriminant Functions
The Discriminant Functions options creates a table of the standardized and unstandardized
canonical discriminant function coefficients for each discriminant
function (see Figure 4-4). The StatAdvisor displays the equation for the first
correlations, Wilks' lambda, chi-square, degrees of freedom, and the p-values for one less
than the number of groups.
Classification Table
The Classification Table option creates a table of the actual and predicted results for the
classifications (see Figure 4-5). The program tabulates the number of observations in each
group that were correctly predicted as being members of that group. It also tabulates the
number of observations actually belonging in other groups that were incorrectly predicted as
being members of that group. The counts and percentages for each group are also displayed.
Use the Classification Table Options dialog box to indicate how the prior probabilities will be
assigned to groups and to indicate how the observations will be displayed.
Group Centroids
The Group Centroids option creates a table of statistics of the location of the centroids
(means) for the unstandardized discriminant functions (see Figure 4-6). Each row in the
table represents a group. Each column contains the centroids for a
single canonical discriminant function.
Group Correlations
The Group Correlations option creates a table of the pooled within-group covariance and
correlation matrices for all the independent variables (see Figure 4-8).
Graphical Options
2D Scatterplot
The 2D (Two-Dimensional) Scatterplot option creates a two-dimensional scatterplot of the
observations by two variables (see Figure 4-9). A different point symbol is used for each
group.
Use the 2D Scatterplot Options dialog box to choose the names of the variables that will be
plotted on the X- and Y-axes.
Use the 3D Scatterplot Options dialog box to choose the names of the variables that will be
plotted on the X-, Y-, and Z-axes.
Discriminant Functions
The Discriminant Functions option creates a plot of the values for two discriminant functions
(see Figure 4-11). You can plot the points using the classification codes you chose on the
Classification Factor text box on the Discriminant Analysis dialog box or you can plot the
predictions for the classification group for each observation or plot using the 2D Discriminant
Function Options dialog box. Different symbols identify the points in each group.
Use the Discriminant Function Plot Options dialog box to enter the number of the functions
that will be plotted on the X- and Y-axes.
You can also use the Target Variables text boxes to enter the names of the variables in which
you want to save the values generated during the analysis. You can enter new names or
accept the defaults.
Note: To access the Save Results Options dialog box, click the Save Results button on the
Analysis toolbar (the fourth button from the left).
References
Anderson, T. W. 1958. Introduction to Multivariate Statistical Analysis. New York: Wiley
Bolch, B. W. and Huang, C. J. 1974. Multivariate Statistical Methods for Business and
Economics. Englewood Cliffs, NJ: Prentice-Hall.
Hair, J., Anderson, R. and Tatham, R. 1992. Multivariate Data Analysis, third edition.
Englewood Cliffs, NJ: Prentice-Hall.
Johnson, R. A. and Wichern, D. W. 1988. Applied Multivariate Statistical Analysis, second
edition. Englewood Cliffs, NJ: Prentice-Hall.
Background Information
Developed by Hotelling in 1936, canonical correlation analysis is a technique you can use to
study the relationship between two sets of variables, each of which might contain several
variables. Its purpose is to summarize or explain the relationship between two sets of
variables by finding a small number of linear combinations from each set of variables that
have the highest correlation possible between the sets. This is known as the first canonical
correlation. The coefficients of these combinations are canonical coefficients or canonical
weights. Usually the canonical
variables are normalized so each canonical variable has a variance of 1.
A second set of canonical variables is then found, which produces a second highest correlation
coefficient. This process continues until the number of pairs equals the number of variables
in the smallest group in the study.
In STATGRAPHICS Plus, it is assumed that the data are drawn from a population that is
multivariate normal. You can test this assumption for pairs of variables by producing an X-Y
Plot to verify that each pair of variables is roughly elliptical and that their points are widely
scattered. Or you can produce an X-Y-Z Plot to test this assumption for sets of three
variables. However, even if sets of two and three variables satisfy the multivariate normal
assumptions, a full set of data may not. If that is the case, you may want to try using square
root, logarithmic, or some other transformation.
Analysis Summary
The Analysis Summary option creates a summary of the analysis that includes the names of
the variables in both the first and second sets, and the number of complete cases in the
analysis (see Figure 5-2). The summary then displays the results of the analysis —
information about the canonical correlations and the coefficients for the canonical variables
in both the first and second sets. Canonical correlations with small p-values (less than .05)
are significant. Large correlations are associated with large eigenvalues and chi-square
values.
Data Table
The Data Table option creates a table of the values for the canonical variables in both the
first and second sets of variables (see Figure 5-3).
Use the Canonical Variables Plot Options dialog box to enter the number of the variable that
will appear on the plot.
You can also use the Target Variables text boxes to enter the names of the variables in which
you want to save the values generated during the analysis. You can enter new names or
accept the defaults.
Note: To access the Save Results Options dialog box, click the Save Results button on the
Analysis toolbar (the fourth button from the left).
Hair, J., Anderson, R., and Tatham, R. 1992. Multivariate Data Analysis, third edition.
Englewood Cliffs, NJ: Prentice-Hall.
Morrison, D. F. 1990. Multivariate Statistical Methods, third edition. New York: McGraw-
Hill.
STATGRAPHICS Plus allows you to create and save correlation and covariance matrices.
You can use matrix data in the Multivariate Methods product by entering the data into any
of these analyses: Principal Components, Factor Analysis, and Cluster Analysis.
This chapter provides instructions for creating and saving matrix data. The Tutorial Manual
for Multivariate Methods contains a tutorial that shows how to create a covariance matrix
then use it to perform a Factor analysis.
Before you create the matrix, open STATGRAPHICS Plus, then open a data file. For
this example, use the Obesity.sf data file.
1. Open STATGRAPHICS, then choose FILE... OPEN... OPEN DATA FILE... from the File
menu to display the Open Data File dialog box.
2. Enter the Obesity.sf file into the File Name text box, then click Open to open the
file.
2. Choose and enter the following variable names into the Data text box: x2 coeff, x3
pgmntcr, x4 phospht, x5 calcium, and x6 phosphs.
3. Type first(50) into the Select text box to use only the first 50 variables in the data
set.
4. Click OK to create the matrix and display the Analysis Summary and the Scatterplot
Matrix in the Analysis window.
1. Click the Save Results button on the Analysis toolbar to display the Save Results
Options dialog box with save options shown in Figure 6-2.
2. Click the Correlations check box. Notice the names of the variables in the Target
Variables text boxes in Figure 6-2. The program uses the following convention to
save the matrices:
3. Click OK to save the correlation matrix as a set of variables and redisplay the
Analysis Summary and Scatterplot Matrix.
As you can see, using a correlation or covariance matrix as data in the Multivariate
Methods product is simple — just complete the Analysis dialog box, being sure to
enter the variables in the order they were placed in the
DataSheet. For example, if the correlation matrix has five columns, enter: CMAT 1,
CMAT 2, CMAT 3, CMAT 4, and CMAT 5. If the covariance matrix has five
columns, enter: VMAT 1, VMAT 2, VMAT 3, VMAT 4, and VMAT 5.
After you save the matrix, the variable names appear in the list box on the Multiple-
Variable Analysis dialog box.