[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other? |

Date |
Tue, 18 Aug 2009 17:18:53 +0100 |

I guess Cameron does not mean quite what he says, which is that factor analysis can only be used on psychometric measures. In principle I can readily imagine fruitful applications on quite different kinds of data. But I largely agree with the spirit of his comment, which I take to be -- my words not his -- that expecting factor analysis to see structure in a mess independently of some understanding is likely to be expecting far too much. However, my impression is that is exactly what almost all users of factor analysis seem to expect! I've found occasional use of PCA in the following way. 1. Plot the data. 2. Calculate correlations, etc. 3. Look at the results: get some ideas. 4. Calculate PCs. 5. Use PCs to help structure understanding of #1 and #2 in terms of variables that go together, variables that are singletons, etc. Sometimes, results of #1 and #2 now make more sense in their own terms. (For example, a reordering of a scatter plot matrix or correlation matrix makes it easier to see what is going on.) Often it is useful here to look at a table of correlations between original variables and new PCs. -cpcorr- from SSC helps with that. 6. Now discard PC results and proceed with modelling. As in some fields every minor variation on a technique is blessed with a name, I'll dub this disposable principal component analysis. Nick n.j.cox@durham.ac.uk Cameron McIntosh Adrian,I think it would be a complete travesty to just feed that whole dataset into a factor analysis. Sure, it'll lump together variables with high correlations, but most of the time this doesn't reflect what's going on underneath the data (e.g., a web of diect and indirect causal relations that generated the observed associations/covariance matrix), and this type of situation is what tends to give factor analysis a "bad name" among statisticians. Factor analysis is typically only appropriate for reflective psychometric measures written specifically to assess an underlying trait (e.g., self-esteem, anxiety), not datasets like yours. I think there are probably complex causal relations among your variables that you should think hard about (using your theoretical knowledge about these variables)and maybe come up with a path-analytic model or growth curve model (say, GDP trajectory and its predictors). You could also compare models across countries. From: kokootchke@hotmail.com > Thank you to Cameron, Bob and everybody else for the references. > > I have a response to Jay and a couple more questions for everybody, if you can still help me... Jay wrote: >> Before you go any further I think you have a big problem to consider: 100 variables on, say 200 countries means you have WAY more covariances (or correlations) than you have countries. This means your correlation matrix is singular. > > > I don't think I have that problem because I don't have 200 countries. I only have about 30+ countries. > > However, even if I had 200 countries, I don't understand exactly what the problem would be because I have all 100 variables for country i and all 100 variables for country j stacked on one another. So, I have: > > country year GDP inflation reserves > Argentina 1990 2.3 6.4 100 > Argentina 1991 2.8 7.4 250 > Argentina 1992 2.6 7.0 200 > ... > Argentina 2006 3.2 8.0 400 > Brazil 1990 1.7 5.4 120 > Brazil 1991 2.1 6.3 140 > Brazil 1992 2.5 7.0 180 > ... > > > So the variables I enter into my factor analysis are GDP, inflation, and reserves... and so the -factor- command in Stata knows nothing about the panel/time-series structure of my data. I can see why it should be relevant to account for the underlying panel structure of the data -- for instance, that jump in GDP/inflation/reserves and any other variables between Argentina in 2006 and Brazil in 1990 may be a bit strange to account for. > > So, the first question is: do I need to take this panel structure into account? And if so, how? > > The other question is, do units matter? For instance, I know that factor analysis or PCA are all based on a variance-covariance matrix... but if I have two variables, x and y, and I take the covariance between the two of them, that'll be different than if I take the covariance of, say 2x and y: > > cov(x,y) <> cov(2x,y) > > and so what would happen if I express my GDP in dollars for all countries or in local-currency units?? Or in millions or in billions??? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?***From:*"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>

**RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?***From:*Cameron McIntosh <cnm100@hotmail.com>

**References**:**st: Aren't distinct factors from factor analysis or PCA orthogonal to each other?***From:*kokootchke <kokootchke@hotmail.com>

**st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?***From:*"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>

**RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?***From:*kokootchke <kokootchke@hotmail.com>

**RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?***From:*Cameron McIntosh <cnm100@hotmail.com>

- Prev by Date:
**st: weighted estimates for a multiply imputed variable?** - Next by Date:
**Re: Re: Re: st: How to make certain variable names and related coefficients red in the word in the -estout-?** - Previous by thread:
**RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?** - Next by thread:
**RE: st: RE: Aren't distinct factors from factor analysis or PCA orthogonal to each other?** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |