The content below is applicable for Stata 8.
Factor analysis
- Principal-components factor
- Principal factor
- Iterated principal factor
- ML factors
|
- Varimax rotation with and without Horst standardization
- Promax rotation with and without Horst standardization
- Bartlett scoring
- Regression scoring
|
Stata’s factor command allows you to fit common-factor models;
see also principal components.
By default, factor produces estimates using the principal-factor
method (communalities set to the squared multiple-correlation coefficients).
Alternatively, factor can produce iterated principal-factor estimates
(communalities re-estimated iteratively), principal-components factor
estimates (communalities set to 1), or maximum-likelihood factor estimates.
After you fit a factor model, Stata allows you to rotate the factor-loading
matrix using the varimax (orthogonal) and promax (oblique) methods. Stata
can score a set of factor estimates using either rotated or unrotated
loadings. Both regression and Bartlett scorings are available.
Below we fit a maximum-likelihood factor model on eight medical symptoms
from a medical outcomes study (Tarlov et al. 1989) using three factors:
. factor joints-throat, ml factors(3) protect(5)
(obs=3046)
Likelihood verification 0, maximum = -21.8257
Likelihood verification 1, maximum = -21.8257
Likelihood verification 2, maximum = -21.8257
Likelihood verification 3, maximum = -18.4300
Likelihood verification 4, maximum = -21.8257
Likelihood verification 5, maximum = -18.4300
Differing maxima obtained.
Iteration 0: log likelihood =-1925.2187
Iteration 1: log likelihood =-40.623068
Iteration 2: log likelihood = -27.38831
Iteration 3: log likelihood =-26.291917
Iteration 4: log likelihood = -18.49983
Iteration 5: log likelihood = -18.43281
Iteration 6: log likelihood =-18.430164
Iteration 7: log likelihood =-18.429999
Iteration 8: log likelihood =-18.429988
Iteration 9: log likelihood =-18.429988
Iteration 10: log likelihood =-18.429988
(maximum likelihood factors; 3 factors retained)
Factor Variance Difference Proportion Cumulative
------------------------------------------------------------------
1 2.36049 1.64310 0.6892 0.6892
2 0.71739 0.37019 0.2095 0.8986
3 0.34720 . 0.1014 1.0000
Test: 3 vs. no factors. Chi2( 24) = 4718.59, Prob > chi2 = 0.0000
Test: 3 vs. more factors. Chi2( 7) = 36.79, Prob > chi2 = 0.0000
Factor Loadings
Variable | 1 2 3 Uniqueness
----------+-------------------------------------------
joints | 0.62749 -0.07856 0.26240 0.53124
cough | 0.29859 0.14908 0.05009 0.88611
backache | 0.82633 -0.33130 -0.11018 0.19530
nausea | 0.49540 0.49656 -0.25307 0.44396
indigest | 0.46711 0.39728 -0.06671 0.61953
hvyfeel | 0.57369 0.21220 0.42173 0.44798
headache | 0.50816 0.25731 -0.12097 0.66092
throat | 0.37922 0.25219 0.05205 0.78988
To obtain these results, we typed
factor joints-throat, ml factors(3) protect(5)
All Stata commands share the same syntax:
the command name is followed by the dependent variable; and then the
independent variables; and then, optionally, a comma and any options. We
specified factor's ml option, producing estimates by maximum
likelihood.
We also typed factors(3) to indicate that we wanted to keep the first
three factors.
This is an interesting problem because there are two distinct local maxima.
Stata has a unique feature to ensure that you have found the global maximum
by using different starting points to search out different solutions.
protect(5) indicated that this search was to be performed five times.
We find that most of the explained variance can be attributed to the first
factor. Stata also shows the unique variance attributed to each variable.
The researcher actually fitting this model interpreted the first factor as a
measure of the general level of sickness and the second factor as a
difference between musculoskeletal problems and other types of problems. If
he had wanted to rotate the factor loadings to search for different
interpretations, he could now type rotate to examine an orthogonal
varimax rotation; rotate, promax to examine an oblique promax
rotation; or, for instance, rotate, promax(4) to examine a promax
rotation with promax power 4 (producing simpler loadings but at a cost of
more correlation between factors).
Stata’s score command produces estimates of the factors after
factor or rotate:
. score f1
(based on unrotated factors)
(2 scorings not used)
Scoring Coefficients
Variable | 1
----------+----------
joints | 0.15644
cough | 0.04463
backache | 0.56038
nausea | 0.14779
indigest | 0.09986
hvyfeel | 0.16960
headache | 0.10183
throat | 0.06359
Typing score f1 produced estimates of the first factor. Typing
score f1 f2 would produce estimates of the first two factors, and
typing score f1 f2 f3 (or score f1-f3) would produce estimates
of the first three factors. The names f1, f2, etc., are
arbitrary; the score command allows you to create new variables that
could then be used in analysis.
Stata also has a command for Cronbach’s alpha, providing a simpler way
of combining the eight symptoms, assuming that all have equal weight:
. alpha joints-throat, generate(symplev)
Scale = sum(unstandardized variables)
Average interitem covariance: .3783125
Number of items in the scale: 8
Scale reliability coefficient: 0.7591
. summarize f1 symplev
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
f1 | 3046 4.86e-10 .9314048 -1.254182 3.1028
symplev | 3320 2.021112 .7290644 1 5
. correlate f1 symplev
(obs=3046)
| f1 symplev
--------+------------------
f1| 1.0000
symplev| 0.9343 1.0000
It turns out that the scale created by alpha and the first factor
score estimate are highly correlated with each other.
See
New in Stata 10
for more about what was added in Stata Release 10.
References
- Tarlov, A. R., J. E. Ware, Jr., S. Greenfield, E. C. Nelson, E. Perrin,
and M. Zubkoff. 1989.
The medical outcomes study.
- Journal of the American Medical
Association 262: 925–930.
|