Stata 15 help for factor

[MV] factor -- Factor analysis

Syntax

Factor analysis of data

factor varlist [if] [in] [weight] [, method options]

Factor analysis of a correlation matrix

factormat matname, n(#) [ method options factormat_options]

matname is a square Stata matrix or a vector containing the rowwise upper or lower triangle of the correlation or covariance matrix. If a covariance matrix is provided, it is transformed into a correlation matrix for the factor analysis.

method Description ------------------------------------------------------------------------- Model 2 pf principal factor; the default pcf principal-component factor ipf iterated principal factor ml maximum-likelihood factor -------------------------------------------------------------------------

options Description ------------------------------------------------------------------------- Model 2 factors(#) maximum number of factors to be retained mineigen(#) minimum value of eigenvalues to be retained citerate(#) communality reestimation iterations (ipf only)

Reporting blanks(#) display loadings as blanks when |loadings| < # altdivisor use trace of correlation matrix as the divisor for reported proportions

Maximization protect(#) perform # optimizations and report the best solution (ml only) random use random starting values (ml only); seldom used seed(seed) random-number seed (ml with protect() or random only) maximize_options control the maximization process; seldom used (ml only)

norotated display unrotated solution, even if rotated results are available (replay only) ------------------------------------------------------------------------- norotated does not appear in the dialog box.

factormat_options Description ------------------------------------------------------------------------- Model shape(full) matname is a square symmetric matrix; the default shape(lower) matname is a vector with the rowwise lower triangle (with diagonal) shape(upper) matname is a vector with the rowwise upper triangle (with diagonal) names(namelist) variable names; required if matname is triangular forcepsd modifies matname to be positive semidefinite * n(#) number of observations sds(matname2) vector with standard deviations of variables means(matname3) vector with means of variables ------------------------------------------------------------------------- * n(#) is required for factormat.

bootstrap, by, jackknife, rolling, and statsby are allowed with factor; see prefix. However, bootstrap and jackknife results should be interpreted with caution; identification of the factor parameters involves data-dependent restrictions, possibly leading to badly biased and overdispersed estimates (Milan and Whittaker 1995). Weights are not allowed with the bootstrap prefix. aweights are not allowed with the jackknife prefix. aweights and fweights are allowed with factor; see weight. See [R] factor postestimation for features available after estimation.

Menu

factor

Statistics > Multivariate analysis > Factor and principal component analysis > Factor analysis

factormat

Statistics > Multivariate analysis > Factor and principal component analysis > Factor analysis of a correlation matrix

Description

factor and factormat perform a factor analysis of a correlation matrix. The commands produce principal factor, iterated principal factor, principal-component factor, and maximum-likelihood factor analyses. factor and factormat display the eigenvalues of the correlation matrix, the factor loadings, and the uniqueness of the variables.

factor expects data in the form of variables, allows weights, and can be run for subgroups. factormat is for use with a correlation or covariance matrix.

Options for factor and factormat

+---------+ ----+ Model 2 +----------------------------------------------------------

pf, pcf, ipf, and ml indicate the type of estimation to be performed. The default is pf.

pf specifies that the principal-factor method be used to analyze the correlation matrix. The factor loadings, sometimes called the factor patterns, are computed using the squared multiple correlations as estimates of the communality. pf is the default.

pcf specifies that the principal-component factor method be used to analyze the correlation matrix. The communalities are assumed to be 1.

ipf specifies that the iterated principal-factor method be used to analyze the correlation matrix. This reestimates the communalities iteratively.

ml specifies the maximum-likelihood factor method, assuming multivariate normal observations. This estimation method is equivalent to Rao's canonical-factor method and maximizes the determinant of the partial correlation matrix. Hence, this solution is also meaningful as a descriptive method for nonnormal data. ml is not available for singular correlation matrices. At least three variables must be specified with method ml.

factors(#) and mineigen(#) specify the maximum number of factors to be retained. factors() specifies the number directly, and mineigen() specifies it indirectly, keeping all factors with eigenvalues greater than the indicated value. The options can be specified individually, together, or not at all.

factors(#) sets the maximum number of factors to be retained for later use by the postestimation commands. factor always prints the full set of eigenvalues but prints the corresponding eigenvectors only for retained factors. Specifying a number larger than the number of variables in the varlist is equivalent to specifying the number of variables in the varlist and is the default.

mineigen(#) sets the minimum value of eigenvalues to be retained. The default for all methods except pcf is 0.000005 (effectively zero), meaning that factors associated with negative eigenvalues will not be printed or retained. The default for pcf is 1. Many sources recommend mineigen(1), although the justification is complex and uncertain. If # is less than 0.000005, it is reset to 0.000005.

citerate(#) is used only with ipf and sets the number of iterations for reestimating the communalities. If citerate() is not specified, iterations continue until the change in the communalities is small. ipf with citerate(0) produces the same results that pf does.

+-----------+ ----+ Reporting +--------------------------------------------------------

blanks(#) specifies that factor loadings smaller than # (in absolute value) be displayed as blanks.

altdivisor specifies that reported proportions and cumulative proportions be computed using the trace of the correlation matrix, trace(e(C)), as the divisor. The default is to use the sum of all eigenvalues (even those that are negative) as the divisor.

+--------------+ ----+ Maximization +-----------------------------------------------------

protect(#) is used only with ml and requests that # optimizations with random starting values be performed along with squared multiple correlation coefficient starting values and that the best of the solutions be reported. The output also indicates whether all starting values converged to the same solution. When specified with a large number, such as protect(50), this provides reasonable assurance that the solution found is global and is not just a local maximum. If trace is also specified (see [R] maximize), the parameters and likelihoods of each maximization will be printed.

random is used only with ml and requests that random starting values be used. This option is rarely used and should be used only after protect() has shown the presence of multiple maximums.

seed(seed) is used only with ml when the random or protect() options are also specified. seed() specifies the random-number seed; see [R] set seed. If seed() is not specified, the random-number generator starts in whatever state it was last in.

maximize_options: iterate(#), [no]log, trace, tolerance(#), and ltolerance(#), see [R] maximize. These options are seldom used.

The following option is available with factor but is not shown in the dialog box:

norotated specifies that the unrotated factor solution be displayed, even if a rotated factor solution is available. norotated is for use only with replaying results.

Options unique to factormat

+-------+ ----+ Model +------------------------------------------------------------

shape(shape) specifies the shape (storage method) for the covariance or correlation matrix matname. The following shapes are supported:

full specifies that the correlation or covariance structure of k variables is a symmetric k x k matrix. This is the default.

lower specifies that the correlation or covariance structure of k variables is a vector with k(k+1)/2 elements in rowwise lower-triangular order,

C(11) C(21) C(22) C(31) C(32) C(33) ... C(k1) C(k2) ... C(kk)

upper specifies that the correlation or covariance structure of k variables is a vector with k(k+1)/2 elements in rowwise upper-triangular order,

C(11) C(12) C(13) ... C(1k) C(22) C(23) ... C(2k) ... C(k-1,k-1) C(k-1,k) C(kk)

names(namelist) specifies a list of k different names to be used to document output and label estimation results and as variable names by predict. names() is required if the correlation or covariance matrix is in vectorized storage mode (that is, shape(lower) or shape(upper) is specified). By default, factormat verifies that the row and column names of matname and the column or row names of matname2 and matname3 from the sds() and means() options are in agreement. Using the names() option turns off this check.

forcepsd modifies the matrix matname to be positive semidefinite (psd) and so be a proper covariance matrix. If matname is not positive semidefinite, it will have negative eigenvalues. By setting negative eigenvalues to 0 and reconstructing, we obtain the least-squares positive-semidefinite approximation to matname. This approximation is a singular covariance matrix.

n(#), a required option, specifies the number of observations on which matname is based.

sds(matname2) specifies a k x 1 or 1 x k matrix with the standard deviations of the variables. The row or column names should match the variable names, unless the names() option is specified. sds() may be specified only if matname is a correlation matrix. Specify sds() if you have variables in your dataset and want to use predict after factormat. sds() does not affect the computations of factormat but provides information so that predict does not assume that the standard deviations are one.

means(matname3) specifies a k x 1 or 1 x k matrix with the means of the variables. The row or column names should match the variable names, unless the names() option is specified. Specify means() if you have variables in your dataset and want to use predict after factormat. means() does not affect the computations of factormat but provides information so that predict does not assume the means are zero.

Examples of factor

Setup . webuse bg2

Principal factors . factor bg2cost1-bg2cost6

Principal factors, keep 2 factors . factor bg2cost1-bg2cost6, factors(2)

Principal-component factors, keep 2 . factor bg2cost1-bg2cost6, factors(2) pcf

Iterated principal factors, keep 2 . factor bg2cost1-bg2cost6, factors(2) ipf

Maximum-likelihood factors, keep 2 . factor bg2cost1-bg2cost6, factors(2) ml

Examples of factormat

First enter the correlation matrix and set the row and column names.

. matrix C = ( 1.000, 0.943, 0.771 \ /// 0.943, 1.000, 0.605 \ /// 0.771, 0.605, 1.000 )

Next invoke factormat, with the number of observations in n().

. factormat C, n(979) names(visual hearing taste) fac(1)

Same as above, but with C entered as a vector.

. matrix C = ( 1.000, 0.943, 0.771, 1.000, 0.605, 1.000)

Next we use factormat, specifying the storage shape(upper) and the variable names with the option names().

. factormat C, n(979) shape(upper) fac(1) names(visual hearing taste)

Stored results

factor and factormat store the following in e():

Scalars e(N) number of observations e(f) number of retained factors e(evsum) sum of all eigenvalues e(df_m) model degrees of freedom e(df_r) residual degrees of freedom e(chi2_i) likelihood-ratio test of "independence vs. saturated" e(df_i) degrees of freedom of test of "independence vs. saturated" e(p_i) p-value of "independence vs. saturated" e(ll_0) log likelihood of null model (ml only) e(ll) log likelihood (ml only) e(aic) Akaike's AIC (ml only) e(bic) Schwarz's BIC (ml only) e(chi2_1) likelihood-ratio test of "# factors vs. saturated" (ml only) e(df_1) degrees of freedom of test of "# factors vs. saturated" (ml only)

Macros e(cmd) factor e(cmdline) command as typed e(method) pf, pcf, ipf, or ml e(wtype) weight type (factor only) e(wexp) weight expression (factor only) e(title) Factor analysis e(mtitle) description of method (e.g., principal factors) e(heywood) Heywood case (when encountered) e(matrixname) input matrix (factormat only) e(mineigen) specified mineigen() option e(factors) specified factors() option e(rngstate) random-number state used (seed() option only) e(properties) nob noV eigen e(rotate_cmd) factor_rotate e(estat_cmd) factor_estat e(predict) factor_p e(marginsnotok) predictions disallowed by margins

Matrices e(sds) standard deviations of analyzed variables e(means) means of analyzed variables e(C) analyzed correlation matrix e(Phi) variance matrix common factors e(L) factor loadings e(Psi) uniqueness (variance of specific factors) e(Ev) eigenvalues

Functions e(sample) marks estimation sample (factor only)

rotate after factor and factormat stores items in e() along with the estimation command. See Stored results of [MV] factor postestimation and [MV] rotate for details.

Reference

Milan, L., and J. Whittaker. 1995. Application of the parametric bootstrap to models that incorporate a singular value decomposition. Applied Statistics 44: 31-49.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index