[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: R: RE: RE: Factor Analysis: which explained variance? |

Date |
Wed, 23 Dec 2009 16:39:31 -0500 |

Francesco Burchi wrote: >>@ Jay The theoretical reason for this aggregation is that the different variables indicate different types of health knowledge.<< OK, then it makes much sense to generate a sum score from this. >>The following are the results of tetrachoric correlation: Var1 Var2 Var3 Var4 Var1 1 Var2 .1819233 1 Var3 .3699331 .25242738 1 Var4 .18371493 .27407531 .40299934 1, Thanks. Eyeballing this you have a positive manifold and some differences between different items. A one factor model is likely to be appropriate. >>I was specifically asked whether I could justify my choice of one single factor on the basis of the variance explained. Following your reasoning, I could argue that with more than 1 factor it would be unidentified. Just to be sure about the procedure I am following, I have tried to get results keeping the 4 factors: factormat R, n(6926) ipf factor(4) Factor analysis/correlation Number of obs = 6926 Method: iterated principal factors Retained factors = 3 Rotation: (unrotated) Number of params = 6 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 1.28200 1.06199 0.8049 0.8049 Factor2 | 0.22001 0.12912 0.1381 0.9431 Factor3 | 0.09089 0.09108 0.0571 1.0001 Factor4 | -0.00019 . -0.0001 1.0000 -------------------------------------------------------------------------- Could I state that the first factor explains 80% of the common variance?<< Yes, it's pretty clearly one dimensional, with the rest being junk that happens with item-level factor analysis. The uniquenesses associated with the loadings are totally in line with . I also ran the ML factor analysis using: . factormat R, n(6296) ml factors(1) names(v1 v2 v3 v4) (obs=6296) Iteration 0: log likelihood = -216.46349 Iteration 1: log likelihood = -65.941751 Iteration 2: log likelihood = -63.980616 Iteration 3: log likelihood = -63.905495 Iteration 4: log likelihood = -63.90257 Iteration 5: log likelihood = -63.902458 Factor analysis/correlation Number of obs = 6296 Method: maximum likelihood Retained factors = 1 Rotation: (unrotated) Number of params = 4 Schwarz's BIC = 162.796 Log likelihood = -63.90246 (Akaike's) AIC = 135.805 -------------------------------------------------------------------------- Factor | Eigenvalue Difference Proportion Cumulative -------------+------------------------------------------------------------ Factor1 | 1.20010 . 1.0000 1.0000 -------------------------------------------------------------------------- LR test: independent vs. saturated: chi2(6) = 2727.68 Prob>chi2 = 0.0000 LR test: 1 factor vs. saturated: chi2(2) = 127.75 Prob>chi2 = 0.0000 Factor loadings (pattern matrix) and unique variances --------------------------------------- Variable | Factor1 | Uniqueness -------------+----------+-------------- v1 | 0.4583 | 0.7900 v2 | 0.3732 | 0.8607 v3 | 0.7583 | 0.4250 v4 | 0.5252 | 0.7242 --------------------------------------- The chi square tests for this sample size are rather silly, ignore them. The loadings and uniquenesses are almost the same as for IPF (interestingly enough---that's not always true). It won't run anything higher dimensional but I doubt from looking at that tetrachoric correlation matrix you'd find anything. >> Finally, I have tried to add one or two further indicators to improve the analysis. However, I had some theoretical doubts on the inclusion of these variables, and the factor analysis with tetrachoric correlations gave me loadings for these variables much lower than 0.1, thus I was convinced to use only 4 variables.< Are the tetrachoric correlations for the other two variables markedly lower or still meaningful? You might have an oblique two-factor solution. Jay * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: R: RE: R: RE: RE: Factor Analysis: which explained variance?***From:*"Francesco Burchi" <fburchi@uniroma3.it>

**References**:**st: RE: RE: Factor Analysis: which explained variance?***From:*"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>

**st: R: RE: RE: Factor Analysis: which explained variance?***From:*"Francesco Burchi" <fburchi@uniroma3.it>

- Prev by Date:
**st: RE: RE: RE: stsplit for for data with multiple records per subject** - Next by Date:
**Re: st: Cannot allocate more than 779 MB to STATA11** - Previous by thread:
**st: R: RE: RE: Factor Analysis: which explained variance?** - Next by thread:
**st: R: RE: R: RE: RE: Factor Analysis: which explained variance?** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |