help factor, help factormat dialogs: factor factormat
also see: factor postestimation
-------------------------------------------------------------------------------
Title
[MV] factor -- Factor analysis
Syntax
Factor analysis of data
factor varlist [if] [in] [weight] [, method options ]
Factor analysis of a correlation matrix
factormat matname, n(#) [ method options factormat_options ]
method description
-------------------------------------------------------------------------
Model 2
pf principal factor; the default
pcf principal-component factor
ipf iterated principal factor
ml maximum-likelihood factor
-------------------------------------------------------------------------
options description
-------------------------------------------------------------------------
Model 2
factors(#) maximum number of factors to be retained
mineigen(#) minimum value of eigenvalues to be retained
citerate(#) communality reestimation iterations (ipf only)
Reporting
blanks(#) display loadings as blanks when |loadings| < #
+ norotated display unrotated solution, even if rotated results
are available (replay only)
altdivisor use trace of correlation matrix as the divisor for
reported proportions
Maximization
protect(#) perform # optimizations and report the best
solution (ml only)
random use random starting values (ml only); seldom used
seed(seed) random-number seed (ml with protect() or random
only)
maximize_options control the maximization process; seldom used (ml
only)
-------------------------------------------------------------------------
+ norotated does not appear in the dialog box.
factormat_options description
-------------------------------------------------------------------------
Model
shape(full) matname is a square symmetric matrix; the default
shape(lower) matname is a vector with the rowwise lower triangle
(with diagonal)
shape(upper) matname is a vector with the rowwise upper triangle
(with diagonal)
names(namelist) variable names; required if matname is triangular
forcepsd modifies matname to be positive semidefinite
* n(#) number of observations
sds(matname2) vector with standard deviations of variables
means(matname3) vector with means of variables
-------------------------------------------------------------------------
* n(#) is required for factormat.
bootstrap, by, jackknife, rolling, and statsby are allowed with factor;
see prefix. However, bootstrap and jackknife results should be
interpreted with caution; identification of the factor parameters
involves data-dependent restrictions, possibly leading to badly biased
and overdispersed estimates.
Weights are not allowed with the bootstrap prefix.
aweights are not allowed with the jackknife prefix.
aweights and fweights are allowed with factor; see weight.
See [R] factor postestimation for features available after estimation.
Menu
factor
Statistics > Multivariate analysis > Factor and principal component
analysis > Factor analysis
factormat
Statistics > Multivariate analysis > Factor and principal component
analysis > Factor analysis of a correlation matrix
Description
factor and factormat perform a factor analysis of a correlation matrix.
factor and factormat can produce principal factor, iterated principal
factor, principal-component factor, and maximum-likelihood factor
analyses. factor and factormat display the eigenvalues of the
correlation matrix, the factor loadings, and the uniqueness (=
1-communality) of the variables.
factor expects data in the form of variables, allows weights, and can be
run for subgroups (see [D] by). factormat is for use with a correlation
or covariance matrix in the form of a square Stata matrix or a vector
containing the rowwise upper or lower triangle of the correlation or
covariance matrix. This concept is explained in more detail below; see
option shape(). If a covariance matrix is provided to factormat, it is
transformed into a correlation matrix for the factor analysis. To replay
estimation results, you may type either factor or factormat.
Options for factor and factormat
+---------+
----+ Model 2 +----------------------------------------------------------
pf, pcf, ipf, and ml indicate the type of estimation to be performed.
The default is pf.
pf specifies that the principal-factor method be used to analyze the
correlation matrix. The factor loadings, sometimes called the
factor patterns, are computed using the squared multiple
correlations as estimates of the communality. pf is the default.
pcf specifies that the principal-component factor method be used to
analyze the correlation matrix. The communalities are assumed to
be 1.
ipf specifies that the iterated principal-factor method be used to
analyze the correlation matrix. This reestimates the
communalities iteratively.
ml specifies the maximum-likelihood factor method, assuming
multivariate normal observations. This estimation method is
equivalent to Rao's canonical-factor method and maximizes the
determinant of the partial correlation matrix. Hence, this
solution is also meaningful as a descriptive method for nonnormal
data. ml is not available for singular correlation matrices. At
least three variables must be specified with method ml.
factors(#) and mineigen(#) specify the maximum number of factors to be
retained. factors() specifies the number directly, and mineigen()
specifies it indirectly, keeping all factors with eigenvalues greater
than the indicated value. The options can be specified individually,
together, or not at all.
factors(#) sets the maximum number of factors to be retained for
later use by the postestimation commands. factor always prints
the full set of eigenvalues but prints the corresponding
eigenvectors only for retained factors. Specifying a number
larger than the number of variables in the varlist is equivalent
to specifying the number of variables in the varlist and is the
default.
mineigen(#) sets the minimum value of eigenvalues to be retained.
The default for all all methods except pcf is 0.000005
(effectively zero), meaning that factors associated with negative
eigenvalues will not be printed or retained. The default for pcf
is 1. Many sources recommend mineigen(1), although the
justification is complex and uncertain. If # is less than
0.000005, it is reset to 0.000005.
citerate(#) is used only with ipf and sets the number of iterations for
reestimating the communalities. If citerate() is not specified,
iterations continue until the change in the communalities is small.
ipf with citerate(0) produces the same results that pf does.
+-----------+
----+ Reporting +--------------------------------------------------------
blanks(#) specifies that factor loadings smaller than # (in absolute
value) be displayed as blanks.
norotated specifies that the unrotated factor solution be displayed, even
if a rotated factor solution is available. norotated is for use only
with replaying results.
altdivisor specifies that reported proportions and cumulative proportions
are to be computed using the trace of the correlation matrix
(trace(e(C))) as the divisor. The default is to use the sum of all
eigenvalues (even those that are negative) as the divisor.
+--------------+
----+ Maximization +-----------------------------------------------------
protect(#) is used only with ml and requests that # optimizations with
random starting values be performed along with squared
multiple-correlation coefficient starting values and that the best of
the solutions be reported. The output also indicates whether all
starting values converged to the same solution. When specified with
a large number, such as protect(50), this provides reasonable
assurance that the solution found is global and is not just a local
maximum. If trace is also specified (see [R] maximize), the
parameters and likelihoods of each maximization will be printed.
random is used only with ml and requests that random starting values be
used. This option is rarely used and should be used only after
protect() has shown the presence of multiple maximums.
seed(seed) is used only with ml when the random or protect() options are
also specified. seed() specifies the random-number seed; see [R] set
seed. If seed() is not specified, the random-number generator starts
in whatever state it was last in.
maximize_options: iterate(#), [no]log, trace, tolerance(#),
ltolerance(#), see [R] maximize. These options are seldom used.
Options unique to factormat
+-------+
----+ Model +------------------------------------------------------------
shape(shape) specifies the shape (storage method) for the covariance or
correlation matrix matname. The following shapes are supported:
full specifies that the correlation or covariance structure of k
variables is stored as a symmetric k x k matrix. This is the
default.
lower specifies that the correlation or covariance structure of k
variables is stored as a vector with k(k+1)/2 elements in rowwise
lower-triangular order,
C(11) C(21) C(22) C(31) C(32) C(33) ... C(k1) C(k2) ... C(kk)
upper specifies that the correlation or covariance structure of k
variables is stored as a vector with k(k+1)/2 elements in rowwise
upper-triangular order,
C(11) C(12) C(13) ... C(1k) C(22) C(23) ... C(2k) ...
C(k-1 k-1) C(k-1 k) C(kk)
names(namelist) specifies a list of k different names to be used to
document output and label estimation results and as variable names by
predict. names() is required if the correlation or covariance matrix
is in vectorized storage mode (i.e., shape(lower) or shape(upper) is
specified). By default, factormat verifies that the row and column
names of matname and the column or row names of matname2 and matname3
from the sds() and means() options are in agreement. Using the
names() option turns off this check.
forcepsd modifies the matrix matname to be positive semidefinite (psd)
and so be a proper covariance matrix. If matname is not positive
semidefinite, it will have negative eigenvalues. By setting negative
eigenvalues to 0 and reconstructing, we obtain the least-squares
positive-semidefinite approximation to matname. This approximation
is a singular covariance matrix.
n(#), a required option, specifies the number of observations on which
matname is based.
sds(matname2) specifies a k x 1 or 1 x k matrix with the standard
deviations of the variables. The row or column names should match
the variable names unless the names() option is specified. sds() may
be specified only if matname is a correlation matrix. Specify sds()
if you have variables in your dataset and want to use predict after
factormat. sds() does not affect the computations of factormat but
provides information so that predict does not assume that the
standard deviations are one.
means(matname3) specifies a k x 1 or 1 x k matrix with the means of the
variables. The row or column names should match the variable names
unless the names() option is specified. Specify means() if you have
variables in your dataset and want to use predict after factormat.
means() does not affect the computations of factormat but provides
information so that predict does not assume the means are zero.
Examples of factor
Setup
. webuse bg2
Principal factors
. factor bg2cost1-bg2cost6
Principal factors, keep 2 factors
. factor bg2cost1-bg2cost6, factors(2)
Principal-component factors, keep 2
. factor bg2cost1-bg2cost6, factors(2) pcf
Iterated principal factors, keep 2
. factor bg2cost1-bg2cost6, factors(2) ipf
Maximum-likelihood factors, keep 2
. factor bg2cost1-bg2cost6, factors(2) ml
Examples of factormat
First enter the correlation matrix and set the row and column names.
. matrix C = ( 1.000, 0.943, 0.771 \ ///
0.943, 1.000, 0.605 \ ///
0.771, 0.605, 1.000 )
Next invoke factormat, with the number of observations in n().
. factormat C, n(979) names(visual hearing taste) fac(1)
Same as above, but with C entered as a vector.
. matrix C = ( 1.000, 0.943, 0.771, 1.000, 0.605, 1.000)
Next we use factormat, specifying the storage shape(upper) and the
variable names with the option names().
. factormat C, n(979) shape(upper) fac(1) names(visual hearing taste)
Saved results
factor and factormat save the following in e():
Scalars
e(N) number of observations
e(f) number of retained factors
e(evsum) sum of all eigenvalues
e(df_m) model degrees of freedom
e(df_r) residual degrees of freedom
e(chi2_i) likelihood-ratio test of "independence vs.
saturated"
e(df_i) degrees of freedom of test of "independence vs.
saturated"
e(p_i) p-value of "independence vs. saturated"
e(ll_0) log likelihood of null model (ml only)
e(ll) log likelihood (ml only)
e(aic) Akaike's AIC (ml only)
e(bic) Schwartz's BIC (ml only)
e(chi2_1) likelihood-ratio test of "# factors vs . saturated"
(ml only)
e(df_1) degrees of freedom of test of "# factors vs.
saturated" (ml only)
Macros
e(cmd) factor
e(cmdline) command as typed
e(method) pf, pcf, ipf, or ml
e(wtype) weight type (factor only)
e(wexp) weight expression (factor only)
e(title) Factor analysis
e(mtitle) description of method (e.g., principal factors)
e(heywood) Heywood case (when encountered)
e(matrixname) input matrix (factormat only)
e(mineigen) specified mineigen() option
e(factors) specified factors() option
e(seed) starting random-number seed (seed() option only)
e(properties) nob noV eigen
e(rotate_cmd) factor_rotate
e(estat_cmd) factor_estat
e(predict) factor_p
e(marginsnotok) predictions disallowed by margins
Matrices
e(sds) standard deviations of analyzed variables
e(means) means of analyzed variables
e(C) analyzed correlation matrix
e(Phi) variance matrix common factors
e(L) factor loadings
e(Psi) uniqueness (variance of specific factors)
e(Ev) eigenvalues
Functions
e(sample) marks estimation sample (factor only)
Also see
Manual: [MV] factor
Help: [MV] factor postestimation;
[D] impute, [MV] canon, [MV] pca, [R] alpha,