# st: R: RE: RE: Factor Analysis: which explained variance?

 From "Francesco Burchi" To Subject st: R: RE: RE: Factor Analysis: which explained variance? Date Wed, 23 Dec 2009 11:14:36 +0100

```@ Jay

The theoretical reason for this aggregation is that the different variables
indicate different types of health knowledge.
The following are the results of tetrachoric correlation:

Var1  	    Var2            Var3        Var4
Var1              1
Var2      .1819233		1
Var3      .3699331      .25242738             1
Var4      .18371493     .27407531      .40299934          1

I was specifically asked whether I could justify my choice of one single
factor on the basis of the variance explained. Following your reasoning, I
could argue that with more than 1 factor it would be unidentified. Just to
be sure about the procedure I am following, I have tried to get results
keeping the 4 factors:

factormat R, n(6926) ipf   factor(4)

Factor analysis/correlation                    Number of obs    =     6926
Method: iterated principal factors             Retained factors =        3
Rotation: (unrotated)                          Number of params =        6

--------------------------------------------------------------------------
Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
Factor1  |      1.28200      1.06199            0.8049       0.8049
Factor2  |      0.22001      0.12912            0.1381       0.9431
Factor3  |      0.09089      0.09108            0.0571       1.0001
Factor4  |     -0.00019            .           -0.0001       1.0000
--------------------------------------------------------------------------

Could I state that the first factor explains 80% of the common variance?
Finally, I have tried to add one or two further indicators to improve the
analysis. However, I had some theoretical doubts on the inclusion of these
variables, and the factor analysis with tetrachoric correlations gave me
loadings for these variables much lower than 0.1, thus I was convinced to
use only 4 variables.

Thanks,
Francesco

-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Verkuilen, Jay
Inviato: lunedì 21 dicembre 2009 19.54
A: 'statalist@hsphsun2.harvard.edu'
Oggetto: st: RE: RE: Factor Analysis: which explained variance?

Nick Cox wrote:

>P.P.S. the whole notion of variance is perhaps a little suspect when the
originals are indicator variables. <

@Nick: I don't know, you have variances, they're just functions of the mean
(proportion)! However, there are covariances that aren't redundant.

@The original poster:

With four indicators, you really can only afford a one dimensional factor
analysis. Anything higher dimension will be, essentially, unidentified, and
thus even more indeterminate than usual for factor analysis. Three
indicators is exactly identified. Four indicators with correlated factors
that have two indicators per factor is also identified, but if the solution
says that you have three and one you're really out of luck.

Without knowing the tetrachoric correlation matrix (these are indicators,
i.e., binary, so polychoric is just tetrachoric anyhow) it's very hard to
say on any statistical grounds.

Is there a theoretical reason to form a sum score from these indicators? For
instance, do they operate like items on a quiz where you want to know the
total score?

Jay

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```