Home  /  Products  /  Capabilities  /  Multivariate methods  /  Factor analysis
The content below is applicable for Stata 8.

Factor analysis

  • Principal-components factor
  • Principal factor
  • Iterated principal factor
  • ML factors
  • Varimax rotation with and without Horst standardization
  • Promax rotation with and without Horst standardization
  • Bartlett scoring
  • Regression scoring

Stata’s factor command allows you to fit common-factor models; see also principal components.

By default, factor produces estimates using the principal-factor method (communalities set to the squared multiple-correlation coefficients). Alternatively, factor can produce iterated principal-factor estimates (communalities re-estimated iteratively), principal-components factor estimates (communalities set to 1), or maximum-likelihood factor estimates.

After you fit a factor model, Stata allows you to rotate the factor-loading matrix using the varimax (orthogonal) and promax (oblique) methods. Stata can score a set of factor estimates using either rotated or unrotated loadings. Both regression and Bartlett scorings are available.

Below we fit a maximum-likelihood factor model on eight medical symptoms from a medical outcomes study (Tarlov et al. 1989) using three factors:

 	. factor joints-throat, ml factors(3) protect(5)

        (obs=3046)
	Likelihood verification 0, maximum =  -21.8257
	Likelihood verification 1, maximum =  -21.8257
	Likelihood verification 2, maximum =  -21.8257
	Likelihood verification 3, maximum =  -18.4300
	Likelihood verification 4, maximum =  -21.8257
	Likelihood verification 5, maximum =  -18.4300

	Differing maxima obtained.

	Iteration 0:  log likelihood =-1925.2187
	Iteration 1:  log likelihood =-40.623068
	Iteration 2:  log likelihood = -27.38831
	Iteration 3:  log likelihood =-26.291917
	Iteration 4:  log likelihood = -18.49983
	Iteration 5:  log likelihood = -18.43281
	Iteration 6:  log likelihood =-18.430164
	Iteration 7:  log likelihood =-18.429999
	Iteration 8:  log likelihood =-18.429988
	Iteration 9:  log likelihood =-18.429988
	Iteration 10:  log likelihood =-18.429988

	            (maximum likelihood factors; 3 factors retained)
	  Factor     Variance       Difference    Proportion    Cumulative
	------------------------------------------------------------------
	     1        2.36049         1.64310      0.6892         0.6892
	     2        0.71739         0.37019      0.2095         0.8986
	     3        0.34720               .      0.1014         1.0000
	
	Test:  3 vs. no   factors.  Chi2(  24) = 4718.59, Prob > chi2 =  0.0000
	Test:  3 vs. more factors.  Chi2(   7) =   36.79, Prob > chi2 =  0.0000
	
	            Factor Loadings
	 Variable |      1          2          3    Uniqueness
	----------+-------------------------------------------
	   joints |   0.62749   -0.07856    0.26240    0.53124
	    cough |   0.29859    0.14908    0.05009    0.88611
	 backache |   0.82633   -0.33130   -0.11018    0.19530
	   nausea |   0.49540    0.49656   -0.25307    0.44396
	 indigest |   0.46711    0.39728   -0.06671    0.61953
	  hvyfeel |   0.57369    0.21220    0.42173    0.44798
	 headache |   0.50816    0.25731   -0.12097    0.66092
	   throat |   0.37922    0.25219    0.05205    0.78988

To obtain these results, we typed

	factor joints-throat, ml factors(3) protect(5) 

All Stata commands share the same syntax: the command name is followed by the dependent variable; and then the independent variables; and then, optionally, a comma and any options. We specified factor's ml option, producing estimates by maximum likelihood. We also typed factors(3) to indicate that we wanted to keep the first three factors.

This is an interesting problem because there are two distinct local maxima. Stata has a unique feature to ensure that you have found the global maximum by using different starting points to search out different solutions. protect(5) indicated that this search was to be performed five times.

We find that most of the explained variance can be attributed to the first factor. Stata also shows the unique variance attributed to each variable.

The researcher actually fitting this model interpreted the first factor as a measure of the general level of sickness and the second factor as a difference between musculoskeletal problems and other types of problems. If he had wanted to rotate the factor loadings to search for different interpretations, he could now type rotate to examine an orthogonal varimax rotation; rotate, promax to examine an oblique promax rotation; or, for instance, rotate, promax(4) to examine a promax rotation with promax power 4 (producing simpler loadings but at a cost of more correlation between factors).

Stata’s score command produces estimates of the factors after factor or rotate:

 	. score f1
	            (based on unrotated factors)
	            (2 scorings not used)
	
	            Scoring Coefficients
	 Variable |      1
	----------+----------
	   joints |   0.15644
	    cough |   0.04463
	 backache |   0.56038
	   nausea |   0.14779
	 indigest |   0.09986
	  hvyfeel |   0.16960
	 headache |   0.10183
	   throat |   0.06359

Typing score f1 produced estimates of the first factor. Typing score f1 f2 would produce estimates of the first two factors, and typing score f1 f2 f3 (or score f1-f3) would produce estimates of the first three factors. The names f1, f2, etc., are arbitrary; the score command allows you to create new variables that could then be used in analysis.

Stata also has a command for Cronbach’s alpha, providing a simpler way of combining the eight symptoms, assuming that all have equal weight:

 	. alpha joints-throat, generate(symplev)

	Scale = sum(unstandardized variables)

	        Average interitem covariance:     .3783125
	        Number of items in the scale:            8
	        Scale reliability coefficient:      0.7591


	. summarize f1 symplev

	Variable |     Obs        Mean   Std. Dev.       Min        Max
	---------+-----------------------------------------------------
	      f1 |    3046    4.86e-10   .9314048  -1.254182     3.1028
	 symplev |    3320    2.021112   .7290644          1          5
	
	
	. correlate f1 symplev
	(obs=3046)
	
	        |       f1  symplev
	--------+------------------
	      f1|   1.0000
	 symplev|   0.9343   1.0000

It turns out that the scale created by alpha and the first factor score estimate are highly correlated with each other.

See New in Stata 18 to learn about what was added in Stata 18.

References

Tarlov, A. R., J. E. Ware, Jr., S. Greenfield, E. C. Nelson, E. Perrin, and M. Zubkoff. 1989. The medical outcomes study.
Journal of the American Medical Association 262: 925–930.