[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: Re: st: Principal Components Analysis with count data |

Date |
Fri, 14 Aug 2009 12:19:33 -0400 |

Nick Cox wrote: >There are various unstated assumptions and criteria that need to be >spelled out for a fruitful discussion. >1. Continuous versus discrete. I don't know any reason why PCA might not be as helpful, or as useless, on discrete data (e.g. counts) as compared with continuous data. Agreed. The main thing is that discrete variables tend to be quite skewed and thus have strongly attenuated correlations. Much of the dimensionality you find is created by this issue. The temptation is to assume that dimension = substantively interesting variation, but sadly this is often wrong. Instead, dimension = systematic variation, but that's far from the same thing. >I wouldn't think it useful for categorical variables, which I take to be a quite different issue. < Well correspondence analysis is, essentially, principal components for categorical variables in the sense that CA depends on the singular value decomposition of the indicator matrix for categorical data in essentially the same way that PCA (or biplotting) uses the SVD of the data matrix for continuous variables. There's a large literature on it and, indeed, Stata has some nice procedures for it already built in. See -mca- and then expect to do some reading. >2. Skewed versus symmetric. In principle, PCA might work very well even if some of the variables were highly skewed. In practice, skewness quite often goes together with nonlinearities, and a transformation might help in either case. < Yup. JV * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: Re: st: Principal Components Analysis with count data***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: merging datasets and getting different N in resulting dataset if I run several times** - Next by Date:
**RE: Re: st: Principal Components Analysis with count data** - Previous by thread:
**RE: Re: st: Principal Components Analysis with count data** - Next by thread:
**st: Principal Components Analysis with count data** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |