Stata 15 help for factor

```
[MV] factor -- Factor analysis

Syntax

Factor analysis of data

factor varlist [if] [in] [weight] [, method options]

Factor analysis of a correlation matrix

factormat matname, n(#) [ method options factormat_options]

matname is a square Stata matrix or a vector containing the rowwise upper
or lower triangle of the correlation or covariance matrix.  If a
covariance matrix is provided, it is transformed into a correlation
matrix for the factor analysis.

method                Description
-------------------------------------------------------------------------
Model 2
pf                  principal factor; the default
pcf                 principal-component factor
ipf                 iterated principal factor
ml                  maximum-likelihood factor
-------------------------------------------------------------------------

options               Description
-------------------------------------------------------------------------
Model 2
factors(#)          maximum number of factors to be retained
mineigen(#)         minimum value of eigenvalues to be retained
citerate(#)         communality reestimation iterations (ipf only)

Reporting
altdivisor          use trace of correlation matrix as the divisor for
reported proportions

Maximization
protect(#)          perform # optimizations and report the best
solution (ml only)
random              use random starting values (ml only); seldom used
seed(seed)          random-number seed (ml with protect() or random
only)
maximize_options    control the maximization process; seldom used (ml
only)

norotated           display unrotated solution, even if rotated results
are available (replay only)
-------------------------------------------------------------------------
norotated does not appear in the dialog box.

factormat_options     Description
-------------------------------------------------------------------------
Model
shape(full)         matname is a square symmetric matrix; the default
shape(lower)        matname is a vector with the rowwise lower triangle
(with diagonal)
shape(upper)        matname is a vector with the rowwise upper triangle
(with diagonal)
names(namelist)     variable names; required if matname is triangular
forcepsd            modifies matname to be positive semidefinite
* n(#)                number of observations
sds(matname2)       vector with standard deviations of variables
means(matname3)     vector with means of variables
-------------------------------------------------------------------------
* n(#) is required for factormat.

bootstrap, by, jackknife, rolling, and statsby are allowed with factor;
see prefix.  However, bootstrap and jackknife results should be
interpreted with caution; identification of the factor parameters
and overdispersed estimates (Milan and Whittaker 1995).
Weights are not allowed with the bootstrap prefix.
aweights are not allowed with the jackknife prefix.
aweights and fweights are allowed with factor; see weight.
See [R] factor postestimation for features available after estimation.

factor

Statistics > Multivariate analysis > Factor and principal component
analysis > Factor analysis

factormat

Statistics > Multivariate analysis > Factor and principal component
analysis > Factor analysis of a correlation matrix

Description

factor and factormat perform a factor analysis of a correlation matrix.
The commands produce principal factor, iterated principal factor,
principal-component factor, and maximum-likelihood factor analyses.
factor and factormat display the eigenvalues of the correlation matrix,

factor expects data in the form of variables, allows weights, and can be
run for subgroups.  factormat is for use with a correlation or covariance
matrix.

Options for factor and factormat

+---------+
----+ Model 2 +----------------------------------------------------------

pf, pcf, ipf, and ml indicate the type of estimation to be performed.
The default is pf.

pf specifies that the principal-factor method be used to analyze the
factor patterns, are computed using the squared multiple
correlations as estimates of the communality.  pf is the default.

pcf specifies that the principal-component factor method be used to
analyze the correlation matrix.  The communalities are assumed to
be 1.

ipf specifies that the iterated principal-factor method be used to
analyze the correlation matrix.  This reestimates the
communalities iteratively.

ml specifies the maximum-likelihood factor method, assuming
multivariate normal observations.  This estimation method is
equivalent to Rao's canonical-factor method and maximizes the
determinant of the partial correlation matrix.  Hence, this
solution is also meaningful as a descriptive method for nonnormal
data.  ml is not available for singular correlation matrices.  At
least three variables must be specified with method ml.

factors(#) and mineigen(#) specify the maximum number of factors to be
retained.  factors() specifies the number directly, and mineigen()
specifies it indirectly, keeping all factors with eigenvalues greater
than the indicated value.  The options can be specified individually,
together, or not at all.

factors(#) sets the maximum number of factors to be retained for
later use by the postestimation commands.  factor always prints
the full set of eigenvalues but prints the corresponding
eigenvectors only for retained factors.  Specifying a number
larger than the number of variables in the varlist is equivalent
to specifying the number of variables in the varlist and is the
default.

mineigen(#) sets the minimum value of eigenvalues to be retained.
The default for all methods except pcf is 0.000005 (effectively
zero), meaning that factors associated with negative eigenvalues
will not be printed or retained.  The default for pcf is 1.  Many
sources recommend mineigen(1), although the justification is
complex and uncertain.  If # is less than 0.000005, it is reset
to 0.000005.

citerate(#) is used only with ipf and sets the number of iterations for
reestimating the communalities.  If citerate() is not specified,
iterations continue until the change in the communalities is small.
ipf with citerate(0) produces the same results that pf does.

+-----------+
----+ Reporting +--------------------------------------------------------

value) be displayed as blanks.

altdivisor specifies that reported proportions and cumulative proportions
be computed using the trace of the correlation matrix, trace(e(C)),
as the divisor.  The default is to use the sum of all eigenvalues
(even those that are negative) as the divisor.

+--------------+
----+ Maximization +-----------------------------------------------------

protect(#) is used only with ml and requests that # optimizations with
random starting values be performed along with squared multiple
correlation coefficient starting values and that the best of the
solutions be reported.  The output also indicates whether all
starting values converged to the same solution.  When specified with
a large number, such as protect(50), this provides reasonable
assurance that the solution found is global and is not just a local
maximum.  If trace is also specified (see [R] maximize), the
parameters and likelihoods of each maximization will be printed.

random is used only with ml and requests that random starting values be
used.  This option is rarely used and should be used only after
protect() has shown the presence of multiple maximums.

seed(seed) is used only with ml when the random or protect() options are
also specified.  seed() specifies the random-number seed; see [R] set
seed.  If seed() is not specified, the random-number generator starts
in whatever state it was last in.

maximize_options:  iterate(#), [no]log, trace, tolerance(#), and
ltolerance(#), see [R] maximize.  These options are seldom used.

The following option is available with factor but is not shown in the
dialog box:

norotated specifies that the unrotated factor solution be displayed, even
if a rotated factor solution is available.  norotated is for use only
with replaying results.

Options unique to factormat

+-------+
----+ Model +------------------------------------------------------------

shape(shape) specifies the shape (storage method) for the covariance or
correlation matrix matname.  The following shapes are supported:

full specifies that the correlation or covariance structure of k
variables is a symmetric k x k matrix.  This is the default.

lower specifies that the correlation or covariance structure of k
variables is a vector with k(k+1)/2 elements in rowwise
lower-triangular order,

C(11) C(21) C(22) C(31) C(32) C(33) ... C(k1) C(k2) ... C(kk)

upper specifies that the correlation or covariance structure of k
variables is a vector with k(k+1)/2 elements in rowwise
upper-triangular order,

C(11) C(12) C(13) ... C(1k) C(22) C(23) ... C(2k) ...
C(k-1,k-1) C(k-1,k) C(kk)

names(namelist) specifies a list of k different names to be used to
document output and label estimation results and as variable names by
predict.  names() is required if the correlation or covariance matrix
is in vectorized storage mode (that is, shape(lower) or shape(upper)
is specified).  By default, factormat verifies that the row and
column names of matname and the column or row names of matname2 and
matname3 from the sds() and means() options are in agreement.  Using
the names() option turns off this check.

forcepsd modifies the matrix matname to be positive semidefinite (psd)
and so be a proper covariance matrix.  If matname is not positive
semidefinite, it will have negative eigenvalues.  By setting negative
eigenvalues to 0 and reconstructing, we obtain the least-squares
positive-semidefinite approximation to matname.  This approximation
is a singular covariance matrix.

n(#), a required option, specifies the number of observations on which
matname is based.

sds(matname2) specifies a k x 1 or 1 x k matrix with the standard
deviations of the variables.  The row or column names should match
the variable names, unless the names() option is specified.  sds()
may be specified only if matname is a correlation matrix.  Specify
sds() if you have variables in your dataset and want to use predict
after factormat.  sds() does not affect the computations of factormat
but provides information so that predict does not assume that the
standard deviations are one.

means(matname3) specifies a k x 1 or 1 x k matrix with the means of the
variables.  The row or column names should match the variable names,
unless the names() option is specified.  Specify means() if you have
variables in your dataset and want to use predict after factormat.
means() does not affect the computations of factormat but provides
information so that predict does not assume the means are zero.

Examples of factor

Setup
. webuse bg2

Principal factors
. factor bg2cost1-bg2cost6

Principal factors, keep 2 factors
. factor bg2cost1-bg2cost6, factors(2)

Principal-component factors, keep 2
. factor bg2cost1-bg2cost6, factors(2) pcf

Iterated principal factors, keep 2
. factor bg2cost1-bg2cost6, factors(2) ipf

Maximum-likelihood factors, keep 2
. factor bg2cost1-bg2cost6, factors(2) ml

Examples of factormat

First enter the correlation matrix and set the row and column names.

. matrix C = ( 1.000, 0.943,  0.771  \ ///
0.943, 1.000,  0.605  \ ///
0.771, 0.605,  1.000  )

Next invoke factormat, with the number of observations in n().

. factormat C, n(979) names(visual hearing taste) fac(1)

Same as above, but with C entered as a vector.

. matrix C = ( 1.000, 0.943, 0.771, 1.000, 0.605, 1.000)

Next we use factormat, specifying the storage shape(upper) and the
variable names with the option names().

. factormat C, n(979) shape(upper) fac(1) names(visual hearing taste)

Stored results

factor and factormat store the following in e():

Scalars
e(N)                number of observations
e(f)                number of retained factors
e(evsum)            sum of all eigenvalues
e(df_m)             model degrees of freedom
e(df_r)             residual degrees of freedom
e(chi2_i)           likelihood-ratio test of "independence vs.
saturated"
e(df_i)             degrees of freedom of test of "independence vs.
saturated"
e(p_i)              p-value of "independence vs. saturated"
e(ll_0)             log likelihood of null model (ml only)
e(ll)               log likelihood (ml only)
e(aic)              Akaike's AIC (ml only)
e(bic)              Schwarz's BIC (ml only)
e(chi2_1)           likelihood-ratio test of "# factors vs. saturated"
(ml only)
e(df_1)             degrees of freedom of test of "# factors vs.
saturated" (ml only)

Macros
e(cmd)              factor
e(cmdline)          command as typed
e(method)           pf, pcf, ipf, or ml
e(wtype)            weight type (factor only)
e(wexp)             weight expression (factor only)
e(title)            Factor analysis
e(mtitle)           description of method (e.g., principal factors)
e(heywood)          Heywood case (when encountered)
e(matrixname)       input matrix (factormat only)
e(mineigen)         specified mineigen() option
e(factors)          specified factors() option
e(rngstate)         random-number state used (seed() option only)
e(properties)       nob noV eigen
e(rotate_cmd)       factor_rotate
e(estat_cmd)        factor_estat
e(predict)          factor_p
e(marginsnotok)     predictions disallowed by margins

Matrices
e(sds)              standard deviations of analyzed variables
e(means)            means of analyzed variables
e(C)                analyzed correlation matrix
e(Phi)              variance matrix common factors
e(Psi)              uniqueness (variance of specific factors)
e(Ev)               eigenvalues

Functions
e(sample)           marks estimation sample (factor only)

rotate after factor and factormat stores items in e() along with the
estimation command.  See Stored results of [MV] factor postestimation and
[MV] rotate for details.

Reference

Milan, L., and J. Whittaker. 1995. Application of the parametric
bootstrap to models that incorporate a singular value decomposition.
Applied Statistics 44: 31-49.

```