[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Francesco Burchi" <fburchi@uniroma3.it> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: R: RE: R: RE: RE: Factor Analysis: which explained variance? |

Date |
Thu, 24 Dec 2009 17:35:16 +0100 |

Thanks Jay for the detailed answer. Regarding the additional variable, here below is the tetrachoric correlation: Var1 Var2 Var3 Var4 Var5 Var1 1 Var2 .1819233 1 Var3 .3699331 .25242738 1 Var4 .18371493 .27407531 .40299934 1 Var5 .0202 -.0033 -.0687 -.0637 1 How you can see, var5 is not correlated with the other 5 variables. If I run factormat with all 5 factors I get: factormat R, n(6926) ipf factor(5) (obs=6926) Factor analysis/correlation Number of obs = 6926 Method: iterated principal factors Retained factors = 4 Rotation: (unrotated) Number of params = 10 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 1.28517 1.03609 0.7553 0.7553 Factor2 | 0.24908 0.11225 0.1464 0.9017 Factor3 | 0.13684 0.10617 0.0804 0.9821 Factor4 | 0.03067 0.03086 0.0180 1.0001 Factor5 | -0.00019 . -0.0001 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(10) = 3087.75 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances --------------------------------------------------------------------- Variable | Factor1 Factor2 Factor3 Factor4 | Uniqueness -------------+----------------------------------------+-------------- Var1 | 0.4863 0.3566 0.0300 -0.0355 | 0.6342 Var2 | 0.4111 -0.0833 0.2339 -0.0944 | 0.7604 Var3 | 0.7321 0.0487 -0.1889 0.0323 | 0.4249 Var4 | 0.5833 -0.2811 0.0678 0.0679 | 0.5716 Var5 | -0.0592 0.1832 0.2023 0.1218 | 0.9072 --------------------------------------------------------------------- The loading for the 5th variable is extremely low and even negative and the first factor seems to explain still the 75.5% of the common variance. Francesco. -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Verkuilen, Jay Inviato: mercoledì 23 dicembre 2009 22.40 A: 'statalist@hsphsun2.harvard.edu' Oggetto: st: RE: R: RE: RE: Factor Analysis: which explained variance? Francesco Burchi wrote: >>@ Jay The theoretical reason for this aggregation is that the different variables indicate different types of health knowledge.<< OK, then it makes much sense to generate a sum score from this. >>The following are the results of tetrachoric correlation: Var1 Var2 Var3 Var4 Var1 1 Var2 .1819233 1 Var3 .3699331 .25242738 1 Var4 .18371493 .27407531 .40299934 1, Thanks. Eyeballing this you have a positive manifold and some differences between different items. A one factor model is likely to be appropriate. >>I was specifically asked whether I could justify my choice of one single factor on the basis of the variance explained. Following your reasoning, I could argue that with more than 1 factor it would be unidentified. Just to be sure about the procedure I am following, I have tried to get results keeping the 4 factors: factormat R, n(6926) ipf factor(4) Factor analysis/correlation Number of obs = 6926 Method: iterated principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 6 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 1.28200 1.06199 0.8049 0.8049 Factor2 | 0.22001 0.12912 0.1381 0.9431 Factor3 | 0.09089 0.09108 0.0571 1.0001 Factor4 | -0.00019 . -0.0001 1.0000 -------------------------------------------------------------------------- Could I state that the first factor explains 80% of the common variance?<< Yes, it's pretty clearly one dimensional, with the rest being junk that happens with item-level factor analysis. The uniquenesses associated with the loadings are totally in line with . I also ran the ML factor analysis using: . factormat R, n(6296) ml factors(1) names(v1 v2 v3 v4) (obs=6296) Iteration 0: log likelihood = -216.46349 Iteration 1: log likelihood = -65.941751 Iteration 2: log likelihood = -63.980616 Iteration 3: log likelihood = -63.905495 Iteration 4: log likelihood = -63.90257 Iteration 5: log likelihood = -63.902458 Factor analysis/correlation Number of obs = 6296 Method: maximum likelihood Retained factors = 1 Rotation: (unrotated) Number of params = 4 Schwarz's BIC = 162.796 Log likelihood = -63.90246 (Akaike's) AIC = 135.805 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 1.20010 . 1.0000 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(6) = 2727.68 Prob>chi2 = 0.0000 LR test: 1 factor vs. saturated: chi2(2) = 127.75 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances --------------------------------------- Variable | Factor1 | Uniqueness -------------+----------+-------------- v1 | 0.4583 | 0.7900 v2 | 0.3732 | 0.8607 v3 | 0.7583 | 0.4250 v4 | 0.5252 | 0.7242 --------------------------------------- The chi square tests for this sample size are rather silly, ignore them. The loadings and uniquenesses are almost the same as for IPF (interestingly enough---that's not always true). It won't run anything higher dimensional but I doubt from looking at that tetrachoric correlation matrix you'd find anything. >> Finally, I have tried to add one or two further indicators to improve the analysis. However, I had some theoretical doubts on the inclusion of these variables, and the factor analysis with tetrachoric correlations gave me loadings for these variables much lower than 0.1, thus I was convinced to use only 4 variables.< Are the tetrachoric correlations for the other two variables markedly lower or still meaningful? You might have an oblique two-factor solution. Jay * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: R: RE: RE: Factor Analysis: which explained variance?***From:*"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>

- Prev by Date:
**st: Season's Greetings** - Next by Date:
**st: Murphy-Topel** - Previous by thread:
**st: RE: R: RE: RE: Factor Analysis: which explained variance?** - Next by thread:
**st: Negative LR test statistic ?** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |