Thanks Jay for the detailed answer.
Regarding the additional variable, here below is the tetrachoric
correlation:
             Var1  	    Var2            Var3        Var4	    Var5
Var1              1
Var2      .1819233		1
Var3      .3699331      .25242738             1
Var4      .18371493     .27407531      .40299934          1
Var5		.0202  	  -.0033  		-.0687  	   -.0637
1
How you can see, var5 is not correlated with the other 5 variables. If I run
factormat with all 5 factors I get:
factormat R, n(6926) ipf   factor(5)
(obs=6926)
Factor analysis/correlation                      Number of obs    =     6926
    Method: iterated principal factors           Retained factors =        4
    Rotation: (unrotated)                        Number of params =       10
--------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
        Factor1  |      1.28517      1.03609            0.7553       0.7553
        Factor2  |      0.24908      0.11225            0.1464       0.9017
        Factor3  |      0.13684      0.10617            0.0804       0.9821
        Factor4  |      0.03067      0.03086            0.0180       1.0001
        Factor5  |     -0.00019            .           -0.0001       1.0000
--------------------------------------------------------------------------
LR test: independent vs. saturated:  chi2(10) = 3087.75 Prob>chi2 = 0.0000
Factor loadings (pattern matrix) and unique variances
---------------------------------------------------------------------
        Variable |  Factor1   Factor2   Factor3   Factor4 |   Uniqueness 
-------------+----------------------------------------+--------------
     Var1		|   0.4863    0.3566    0.0300   -0.0355 |
0.6342  
    Var2		|   0.4111   -0.0833    0.2339   -0.0944 |
0.7604  
    Var3 		|   0.7321    0.0487   -0.1889    0.0323 |
0.4249  
     Var4 		|   0.5833   -0.2811    0.0678    0.0679 |
0.5716  
    Var5 		|  -0.0592    0.1832    0.2023    0.1218 |
0.9072  
---------------------------------------------------------------------
The loading for the 5th variable is extremely low and even negative and the
first factor seems to explain still the 75.5% of the common variance.
Francesco.
-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Verkuilen, Jay
Inviato: mercoledì 23 dicembre 2009 22.40
A: '[email protected]'
Oggetto: st: RE: R: RE: RE: Factor Analysis: which explained variance?
Francesco Burchi wrote:
>>@ Jay
The theoretical reason for this aggregation is that the different variables
indicate different types of health knowledge.<<
OK, then it makes much sense to generate a sum score from this.  
>>The following are the results of tetrachoric correlation:
              Var1  	    Var2            Var3        Var4
Var1              1
Var2      .1819233		1
Var3      .3699331      .25242738             1
Var4      .18371493     .27407531      .40299934          1,
Thanks. Eyeballing this you have a positive manifold and some differences
between different items. A one factor model is likely to be appropriate.  
>>I was specifically asked whether I could justify my choice of one single
factor on the basis of the variance explained. Following your reasoning, I
could argue that with more than 1 factor it would be unidentified. Just to
be sure about the procedure I am following, I have tried to get results
keeping the 4 factors:
factormat R, n(6926) ipf   factor(4)
Factor analysis/correlation                    Number of obs    =     6926
Method: iterated principal factors             Retained factors =        3
Rotation: (unrotated)                          Number of params =        6
--------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
        Factor1  |      1.28200      1.06199            0.8049       0.8049
        Factor2  |      0.22001      0.12912            0.1381       0.9431
        Factor3  |      0.09089      0.09108            0.0571       1.0001
        Factor4  |     -0.00019            .           -0.0001       1.0000
--------------------------------------------------------------------------
Could I state that the first factor explains 80% of the common variance?<<
Yes, it's pretty clearly one dimensional, with the rest being junk that
happens with item-level factor analysis. The uniquenesses associated with
the loadings are totally in line with . I also ran the ML factor analysis
using:
. factormat R, n(6296) ml  factors(1) names(v1 v2 v3 v4)
(obs=6296)
Iteration 0:   log likelihood = -216.46349
Iteration 1:   log likelihood = -65.941751
Iteration 2:   log likelihood = -63.980616
Iteration 3:   log likelihood = -63.905495
Iteration 4:   log likelihood =  -63.90257
Iteration 5:   log likelihood = -63.902458
Factor analysis/correlation                        Number of obs    =
6296
    Method: maximum likelihood                     Retained factors =
1
    Rotation: (unrotated)                          Number of params =
4
                                                   Schwarz's BIC    =
162.796
    Log likelihood = -63.90246                     (Akaike's) AIC   =
135.805
 
--------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
 
-------------+------------------------------------------------------------
        Factor1  |      1.20010            .            1.0000       1.0000
 
--------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(6)  = 2727.68 Prob>chi2 =
0.0000
    LR test:    1 factor vs. saturated:  chi2(2)  =  127.75 Prob>chi2 =
0.0000
Factor loadings (pattern matrix) and unique variances
    ---------------------------------------
        Variable |  Factor1 |   Uniqueness 
    -------------+----------+--------------
              v1 |   0.4583 |      0.7900  
              v2 |   0.3732 |      0.8607  
              v3 |   0.7583 |      0.4250  
              v4 |   0.5252 |      0.7242  
    ---------------------------------------
The chi square tests for this sample size are rather silly, ignore them. The
loadings and uniquenesses are almost the same as for IPF (interestingly
enough---that's not always true). It won't run anything higher dimensional
but I doubt from looking at that tetrachoric correlation matrix you'd find
anything. 
>>
Finally, I have tried to add one or two further indicators to improve the
analysis. However, I had some theoretical doubts on the inclusion of these
variables, and the factor analysis with tetrachoric correlations gave me
loadings for these variables much lower than 0.1, thus I was convinced to
use only 4 variables.<
Are the tetrachoric correlations for the other two variables markedly lower
or still meaningful? You might have an oblique two-factor solution. 
Jay
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/