Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: R: RE: RE: Factor Analysis: which explained variance?


From   "Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: R: RE: RE: Factor Analysis: which explained variance?
Date   Wed, 23 Dec 2009 16:39:31 -0500

Francesco Burchi wrote:

>>@ Jay

The theoretical reason for this aggregation is that the different variables
indicate different types of health knowledge.<<

OK, then it makes much sense to generate a sum score from this.  




>>The following are the results of tetrachoric correlation:

              Var1  	    Var2            Var3        Var4
Var1              1
Var2      .1819233		1
Var3      .3699331      .25242738             1
Var4      .18371493     .27407531      .40299934          1,

Thanks. Eyeballing this you have a positive manifold and some differences between different items. A one factor model is likely to be appropriate.  



>>I was specifically asked whether I could justify my choice of one single
factor on the basis of the variance explained. Following your reasoning, I
could argue that with more than 1 factor it would be unidentified. Just to
be sure about the procedure I am following, I have tried to get results
keeping the 4 factors:

factormat R, n(6926) ipf   factor(4)

Factor analysis/correlation                    Number of obs    =     6926
Method: iterated principal factors             Retained factors =        3
Rotation: (unrotated)                          Number of params =        6

--------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
-------------+------------------------------------------------------------
        Factor1  |      1.28200      1.06199            0.8049       0.8049
        Factor2  |      0.22001      0.12912            0.1381       0.9431
        Factor3  |      0.09089      0.09108            0.0571       1.0001
        Factor4  |     -0.00019            .           -0.0001       1.0000
--------------------------------------------------------------------------

Could I state that the first factor explains 80% of the common variance?<<

Yes, it's pretty clearly one dimensional, with the rest being junk that happens with item-level factor analysis. The uniquenesses associated with the loadings are totally in line with . I also ran the ML factor analysis using:

. factormat R, n(6296) ml  factors(1) names(v1 v2 v3 v4)
(obs=6296)
Iteration 0:   log likelihood = -216.46349
Iteration 1:   log likelihood = -65.941751
Iteration 2:   log likelihood = -63.980616
Iteration 3:   log likelihood = -63.905495
Iteration 4:   log likelihood =  -63.90257
Iteration 5:   log likelihood = -63.902458

Factor analysis/correlation                        Number of obs    =     6296
    Method: maximum likelihood                     Retained factors =        1
    Rotation: (unrotated)                          Number of params =        4
                                                   Schwarz's BIC    =  162.796
    Log likelihood = -63.90246                     (Akaike's) AIC   =  135.805

    --------------------------------------------------------------------------
         Factor  |   Eigenvalue   Difference        Proportion   Cumulative
    -------------+------------------------------------------------------------
        Factor1  |      1.20010            .            1.0000       1.0000
    --------------------------------------------------------------------------
    LR test: independent vs. saturated:  chi2(6)  = 2727.68 Prob>chi2 = 0.0000
    LR test:    1 factor vs. saturated:  chi2(2)  =  127.75 Prob>chi2 = 0.0000

Factor loadings (pattern matrix) and unique variances

    ---------------------------------------
        Variable |  Factor1 |   Uniqueness 
    -------------+----------+--------------
              v1 |   0.4583 |      0.7900  
              v2 |   0.3732 |      0.8607  
              v3 |   0.7583 |      0.4250  
              v4 |   0.5252 |      0.7242  
    ---------------------------------------

The chi square tests for this sample size are rather silly, ignore them. The loadings and uniquenesses are almost the same as for IPF (interestingly enough---that's not always true). It won't run anything higher dimensional but I doubt from looking at that tetrachoric correlation matrix you'd find anything. 


>>
Finally, I have tried to add one or two further indicators to improve the
analysis. However, I had some theoretical doubts on the inclusion of these
variables, and the factor analysis with tetrachoric correlations gave me
loadings for these variables much lower than 0.1, thus I was convinced to
use only 4 variables.<

Are the tetrachoric correlations for the other two variables markedly lower or still meaningful? You might have an oblique two-factor solution. 

Jay

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index