**[MV] factor** -- Factor analysis

__Syntax__

Factor analysis of data

__fac__**tor** *varlist* [*if*] [*in*] [*weight*] [**,** *method* *options*]

Factor analysis of a correlation matrix

**factormat** *matname***,** **n(***#***)** [ *method* *options* *factormat_options*]

*matname* is a square Stata matrix or a vector containing the rowwise upper
or lower triangle of the correlation or covariance matrix. If a
covariance matrix is provided, it is transformed into a correlation
matrix for the factor analysis.

*method* Description
-------------------------------------------------------------------------
Model 2
**pf** principal factor; the default
**pcf** principal-component factor
__ip__**f** iterated principal factor
**ml** maximum-likelihood factor
-------------------------------------------------------------------------

*options* Description
-------------------------------------------------------------------------
Model 2
__fa__**ctors(***#***)** maximum number of factors to be retained
__mine__**igen(***#***)** minimum value of eigenvalues to be retained
__cit__**erate(***#***)** communality reestimation iterations (**ipf** only)

Reporting
__bl__**anks(***#***)** display loadings as blanks when |loadings| < *#*
__altdiv__**isor** use trace of correlation matrix as the divisor for
reported proportions

Maximization
__pr__**otect(***#***)** perform *#* optimizations and report the best
solution (**ml** only)
__r__**andom** use random starting values (**ml** only); seldom used
**seed(***seed***)** random-number seed (**ml** with **protect()** or **random**
only)
*maximize_options* control the maximization process; seldom used (**ml**
only)

__norot__**ated** display unrotated solution, even if rotated results
are available (replay only)
-------------------------------------------------------------------------
**norotated** does not appear in the dialog box.

*factormat_options* Description
-------------------------------------------------------------------------
Model
__sh__**ape(**__f__**ull)** *matname* is a square symmetric matrix; the default
__sh__**ape(**__l__**ower)** *matname* is a vector with the rowwise lower triangle
(with diagonal)
__sh__**ape(**__u__**pper)** *matname* is a vector with the rowwise upper triangle
(with diagonal)
__nam__**es(***namelist***)** variable names; required if *matname* is triangular
**forcepsd** modifies *matname* to be positive semidefinite
* **n(***#***)** number of observations
**sds(***matname2***)** vector with standard deviations of variables
**means(***matname3***)** vector with means of variables
-------------------------------------------------------------------------
* **n(***#***)** is required for **factormat**.

**bootstrap**, **by**, **jackknife**, **rolling**, and **statsby** are allowed with **factor**;
see prefix. However, **bootstrap** and **jackknife** results should be
interpreted with caution; identification of the **factor** parameters
involves data-dependent restrictions, possibly leading to badly biased
and overdispersed estimates (Milan and Whittaker 1995).
Weights are not allowed with the **bootstrap** prefix.
**aweight**s are not allowed with the **jackknife** prefix.
**aweight**s and **fweight**s are allowed with **factor**; see weight.
See **[R] factor postestimation** for features available after estimation.

__Menu__

__factor__

**Statistics > Multivariate analysis >** **Factor and principal component**
**analysis > Factor analysis**

__factormat__

**Statistics > Multivariate analysis >** **Factor and principal component**
**analysis >** **Factor analysis of a correlation matrix**

__Description__

**factor** and **factormat** perform a factor analysis of a correlation matrix.
The commands produce principal factor, iterated principal factor,
principal-component factor, and maximum-likelihood factor analyses.
**factor** and **factormat** display the eigenvalues of the correlation matrix,
the factor loadings, and the uniqueness of the variables.

**factor** expects data in the form of variables, allows weights, and can be
run for subgroups. **factormat** is for use with a correlation or covariance
matrix.

__Options for factor and factormat__

+---------+
----+ Model 2 +----------------------------------------------------------

**pf**, **pcf**, **ipf**, and **ml** indicate the type of estimation to be performed.
The default is **pf**.

**pf** specifies that the principal-factor method be used to analyze the
correlation matrix. The factor loadings, sometimes called the
factor patterns, are computed using the squared multiple
correlations as estimates of the communality. **pf** is the default.

**pcf** specifies that the principal-component factor method be used to
analyze the correlation matrix. The communalities are assumed to
be 1.

**ipf** specifies that the iterated principal-factor method be used to
analyze the correlation matrix. This reestimates the
communalities iteratively.

**ml** specifies the maximum-likelihood factor method, assuming
multivariate normal observations. This estimation method is
equivalent to Rao's canonical-factor method and maximizes the
determinant of the partial correlation matrix. Hence, this
solution is also meaningful as a descriptive method for nonnormal
data. **ml** is not available for singular correlation matrices. At
least three variables must be specified with method **ml**.

**factors(***#***)** and **mineigen(***#***)** specify the maximum number of factors to be
retained. **factors()** specifies the number directly, and **mineigen()**
specifies it indirectly, keeping all factors with eigenvalues greater
than the indicated value. The options can be specified individually,
together, or not at all.

**factors(***#***)** sets the maximum number of factors to be retained for
later use by the postestimation commands. **factor** always prints
the full set of eigenvalues but prints the corresponding
eigenvectors only for retained factors. Specifying a number
larger than the number of variables in the *varlist* is equivalent
to specifying the number of variables in the *varlist* and is the
default.

**mineigen(***#***)** sets the minimum value of eigenvalues to be retained.
The default for all methods except **pcf** is 0.000005 (effectively
zero), meaning that factors associated with negative eigenvalues
will not be printed or retained. The default for **pcf** is 1. Many
sources recommend **mineigen(1)**, although the justification is
complex and uncertain. If *#* is less than 0.000005, it is reset
to 0.000005.

**citerate(***#***)** is used only with **ipf** and sets the number of iterations for
reestimating the communalities. If **citerate()** is not specified,
iterations continue until the change in the communalities is small.
**ipf** with **citerate(0)** produces the same results that **pf** does.

+-----------+
----+ Reporting +--------------------------------------------------------

**blanks(***#***)** specifies that factor loadings smaller than *#* (in absolute
value) be displayed as blanks.

**altdivisor** specifies that reported proportions and cumulative proportions
be computed using the trace of the correlation matrix, **trace(e(C))**,
as the divisor. The default is to use the sum of all eigenvalues
(even those that are negative) as the divisor.

+--------------+
----+ Maximization +-----------------------------------------------------

**protect(***#***)** is used only with **ml** and requests that *#* optimizations with
random starting values be performed along with squared multiple
correlation coefficient starting values and that the best of the
solutions be reported. The output also indicates whether all
starting values converged to the same solution. When specified with
a large number, such as **protect(50)**, this provides reasonable
assurance that the solution found is global and is not just a local
maximum. If **trace** is also specified (see **[R] maximize**), the
parameters and likelihoods of each maximization will be printed.

**random** is used only with **ml** and requests that random starting values be
used. This option is rarely used and should be used only after
**protect()** has shown the presence of multiple maximums.

**seed(***seed***)** is used only with **ml** when the **random** or **protect()** options are
also specified. **seed()** specifies the random-number seed; see **[R] set**
**seed**. If **seed()** is not specified, the random-number generator starts
in whatever state it was last in.

*maximize_options*: __iter__**ate(***#***)**, [__no__]__lo__**g**, __tr__**ace**, __tol__**erance(***#***)**, and
__ltol__**erance(***#***)**, see **[R] maximize**. These options are seldom used.

The following option is available with **factor** but is not shown in the
dialog box:

**norotated** specifies that the unrotated factor solution be displayed, even
if a rotated factor solution is available. **norotated** is for use only
with replaying results.

__Options unique to factormat__

+-------+
----+ Model +------------------------------------------------------------

**shape(***shape***)** specifies the shape (storage method) for the covariance or
correlation matrix *matname*. The following shapes are supported:

**full** specifies that the correlation or covariance structure of k
variables is a symmetric k x k matrix. This is the default.

**lower** specifies that the correlation or covariance structure of k
variables is a vector with k(k+1)/2 elements in rowwise
lower-triangular order,

C(11) C(21) C(22) C(31) C(32) C(33) ... C(k1) C(k2) ... C(kk)

**upper** specifies that the correlation or covariance structure of k
variables is a vector with k(k+1)/2 elements in rowwise
upper-triangular order,

C(11) C(12) C(13) ... C(1k) C(22) C(23) ... C(2k) ...
C(k-1,k-1) C(k-1,k) C(kk)

**names(***namelist***)** specifies a list of k different names to be used to
document output and label estimation results and as variable names by
**predict**. **names()** is required if the correlation or covariance matrix
is in vectorized storage mode (that is, **shape(lower)** or **shape(upper)**
is specified). By default, **factormat** verifies that the row and
column names of *matname* and the column or row names of *matname2* and
*matname3* from the **sds()** and **means()** options are in agreement. Using
the **names()** option turns off this check.

**forcepsd** modifies the matrix *matname* to be positive semidefinite (psd)
and so be a proper covariance matrix. If *matname* is not positive
semidefinite, it will have negative eigenvalues. By setting negative
eigenvalues to 0 and reconstructing, we obtain the least-squares
positive-semidefinite approximation to *matname*. This approximation
is a singular covariance matrix.

**n(***#***)**, a required option, specifies the number of observations on which
*matname* is based.

**sds(***matname2***)** specifies a k x 1 or 1 x k matrix with the standard
deviations of the variables. The row or column names should match
the variable names, unless the **names()** option is specified. **sds()**
may be specified only if *matname* is a correlation matrix. Specify
**sds()** if you have variables in your dataset and want to use **predict**
after **factormat**. **sds()** does not affect the computations of **factormat**
but provides information so that **predict** does not assume that the
standard deviations are one.

**means(***matname3***)** specifies a k x 1 or 1 x k matrix with the means of the
variables. The row or column names should match the variable names,
unless the **names()** option is specified. Specify **means()** if you have
variables in your dataset and want to use **predict** after **factormat**.
**means()** does not affect the computations of **factormat** but provides
information so that **predict** does not assume the means are zero.

__Examples of factor__

Setup
**. webuse bg2**

Principal factors
**. factor bg2cost1-bg2cost6**

Principal factors, keep 2 factors
**. factor bg2cost1-bg2cost6, factors(2)**

Principal-component factors, keep 2
**. factor bg2cost1-bg2cost6, factors(2) pcf**

Iterated principal factors, keep 2
**. factor bg2cost1-bg2cost6, factors(2) ipf**

Maximum-likelihood factors, keep 2
**. factor bg2cost1-bg2cost6, factors(2) ml**

__Examples of factormat__

First enter the correlation matrix and set the row and column names.

**. matrix C = ( 1.000, 0.943, 0.771 \ ///**
** 0.943, 1.000, 0.605 \ ///**
** 0.771, 0.605, 1.000 ) **

Next invoke **factormat**, with the number of observations in **n()**.

**. factormat C, n(979) names(visual hearing taste) fac(1)**

Same as above, but with **C** entered as a vector.

**. matrix C = ( 1.000, 0.943, 0.771, 1.000, 0.605, 1.000)**

Next we use **factormat**, specifying the storage **shape(upper)** and the
variable names with the option **names()**.

**. factormat C, n(979) shape(upper) fac(1) names(visual hearing taste)**

__Stored results__

**factor** and **factormat** store the following in **e()**:

Scalars
**e(N)** number of observations
**e(f)** number of retained factors
**e(evsum)** sum of all eigenvalues
**e(df_m)** model degrees of freedom
**e(df_r)** residual degrees of freedom
**e(chi2_i)** likelihood-ratio test of "independence vs.
saturated"
**e(df_i)** degrees of freedom of test of "independence vs.
saturated"
**e(p_i)** p-value of "independence vs. saturated"
**e(ll_0)** log likelihood of null model (**ml** only)
**e(ll)** log likelihood (**ml** only)
**e(aic)** Akaike's AIC (**ml** only)
**e(bic)** Schwarz's BIC (**ml** only)
**e(chi2_1)** likelihood-ratio test of "# factors vs. saturated"
(**ml** only)
**e(df_1)** degrees of freedom of test of "# factors vs.
saturated" (**ml** only)

Macros
**e(cmd)** **factor**
**e(cmdline)** command as typed
**e(method)** **pf**, **pcf**, **ipf**, or **ml**
**e(wtype)** weight type (**factor** only)
**e(wexp)** weight expression (**factor** only)
**e(title)** **Factor analysis**
**e(mtitle)** description of method (e.g., **principal factors**)
**e(heywood)** **Heywood case** (when encountered)
**e(matrixname)** input matrix (**factormat** only)
**e(mineigen)** specified **mineigen()** option
**e(factors)** specified **factors()** option
**e(rngstate)** random-number state used (**seed()** option only)
**e(properties)** **nob noV eigen**
**e(rotate_cmd)** **factor_rotate**
**e(estat_cmd)** **factor_estat**
**e(predict)** **factor_p**
**e(marginsnotok)** predictions disallowed by **margins**

Matrices
**e(sds)** standard deviations of analyzed variables
**e(means)** means of analyzed variables
**e(C)** analyzed correlation matrix
**e(Phi)** variance matrix common factors
**e(L)** factor loadings
**e(Psi)** uniqueness (variance of specific factors)
**e(Ev)** eigenvalues

Functions
**e(sample)** marks estimation sample (**factor** only)

**rotate** after **factor** and **factormat** stores items in **e()** along with the
estimation command. See *Stored results* of **[MV] factor postestimation** and
**[MV] rotate** for details.

__Reference__

Milan, L., and J. Whittaker. 1995. Application of the parametric
bootstrap to models that incorporate a singular value decomposition.
*Applied Statistics* 44: 31-49.