# st: R: RE: R: RE: RE: Factor Analysis: which explained variance?

 From "Francesco Burchi" To Subject st: R: RE: R: RE: RE: Factor Analysis: which explained variance? Date Thu, 24 Dec 2009 17:35:16 +0100

```Thanks Jay for the detailed answer.

Regarding the additional variable, here below is the tetrachoric
correlation:

Var1  	    Var2            Var3        Var4	    Var5
Var1              1
Var2      .1819233		1
Var3      .3699331      .25242738             1
Var4      .18371493     .27407531      .40299934          1
Var5		.0202  	  -.0033  		-.0687  	   -.0637
1

How you can see, var5 is not correlated with the other 5 variables. If I run
factormat with all 5 factors I get:

factormat R, n(6926) ipf   factor(5)
(obs=6926)

Factor analysis/correlation                      Number of obs    =     6926
Method: iterated principal factors           Retained factors =        4
Rotation: (unrotated)                        Number of params =       10

--------------------------------------------------------------------------
Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
Factor1  |      1.28517      1.03609            0.7553       0.7553
Factor2  |      0.24908      0.11225            0.1464       0.9017
Factor3  |      0.13684      0.10617            0.0804       0.9821
Factor4  |      0.03067      0.03086            0.0180       1.0001
Factor5  |     -0.00019            .           -0.0001       1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated:  chi2(10) = 3087.75 Prob>chi2 = 0.0000

---------------------------------------------------------------------
Variable |  Factor1   Factor2   Factor3   Factor4 |   Uniqueness
-------------+----------------------------------------+--------------
Var1		|   0.4863    0.3566    0.0300   -0.0355 |
0.6342
Var2		|   0.4111   -0.0833    0.2339   -0.0944 |
0.7604
Var3 		|   0.7321    0.0487   -0.1889    0.0323 |
0.4249
Var4 		|   0.5833   -0.2811    0.0678    0.0679 |
0.5716
Var5 		|  -0.0592    0.1832    0.2023    0.1218 |
0.9072
---------------------------------------------------------------------

The loading for the 5th variable is extremely low and even negative and the
first factor seems to explain still the 75.5% of the common variance.

Francesco.

-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Verkuilen, Jay
Inviato: mercoledì 23 dicembre 2009 22.40
A: 'statalist@hsphsun2.harvard.edu'
Oggetto: st: RE: R: RE: RE: Factor Analysis: which explained variance?

Francesco Burchi wrote:

>>@ Jay

The theoretical reason for this aggregation is that the different variables
indicate different types of health knowledge.<<

OK, then it makes much sense to generate a sum score from this.

>>The following are the results of tetrachoric correlation:

Var1  	    Var2            Var3        Var4
Var1              1
Var2      .1819233		1
Var3      .3699331      .25242738             1
Var4      .18371493     .27407531      .40299934          1,

Thanks. Eyeballing this you have a positive manifold and some differences
between different items. A one factor model is likely to be appropriate.

>>I was specifically asked whether I could justify my choice of one single
factor on the basis of the variance explained. Following your reasoning, I
could argue that with more than 1 factor it would be unidentified. Just to
be sure about the procedure I am following, I have tried to get results
keeping the 4 factors:

factormat R, n(6926) ipf   factor(4)

Factor analysis/correlation                    Number of obs    =     6926
Method: iterated principal factors             Retained factors =        3
Rotation: (unrotated)                          Number of params =        6

--------------------------------------------------------------------------
Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
Factor1  |      1.28200      1.06199            0.8049       0.8049
Factor2  |      0.22001      0.12912            0.1381       0.9431
Factor3  |      0.09089      0.09108            0.0571       1.0001
Factor4  |     -0.00019            .           -0.0001       1.0000
--------------------------------------------------------------------------

Could I state that the first factor explains 80% of the common variance?<<

Yes, it's pretty clearly one dimensional, with the rest being junk that
happens with item-level factor analysis. The uniquenesses associated with
the loadings are totally in line with . I also ran the ML factor analysis
using:

. factormat R, n(6296) ml  factors(1) names(v1 v2 v3 v4)
(obs=6296)
Iteration 0:   log likelihood = -216.46349
Iteration 1:   log likelihood = -65.941751
Iteration 2:   log likelihood = -63.980616
Iteration 3:   log likelihood = -63.905495
Iteration 4:   log likelihood =  -63.90257
Iteration 5:   log likelihood = -63.902458

Factor analysis/correlation                        Number of obs    =
6296
Method: maximum likelihood                     Retained factors =
1
Rotation: (unrotated)                          Number of params =
4
Schwarz's BIC    =
162.796
Log likelihood = -63.90246                     (Akaike's) AIC   =
135.805

--------------------------------------------------------------------------
Factor  |   Eigenvalue   Difference        Proportion   Cumulative

-------------+------------------------------------------------------------
Factor1  |      1.20010            .            1.0000       1.0000

--------------------------------------------------------------------------
LR test: independent vs. saturated:  chi2(6)  = 2727.68 Prob>chi2 =
0.0000
LR test:    1 factor vs. saturated:  chi2(2)  =  127.75 Prob>chi2 =
0.0000

---------------------------------------
Variable |  Factor1 |   Uniqueness
-------------+----------+--------------
v1 |   0.4583 |      0.7900
v2 |   0.3732 |      0.8607
v3 |   0.7583 |      0.4250
v4 |   0.5252 |      0.7242
---------------------------------------

The chi square tests for this sample size are rather silly, ignore them. The
enough---that's not always true). It won't run anything higher dimensional
but I doubt from looking at that tetrachoric correlation matrix you'd find
anything.

>>
Finally, I have tried to add one or two further indicators to improve the
analysis. However, I had some theoretical doubts on the inclusion of these
variables, and the factor analysis with tetrachoric correlations gave me
loadings for these variables much lower than 0.1, thus I was convinced to
use only 4 variables.<

Are the tetrachoric correlations for the other two variables markedly lower
or still meaningful? You might have an oblique two-factor solution.

Jay

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```