[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: Re: st: Principal Components Analysis with count data |

Date |
Thu, 13 Aug 2009 17:53:20 +0100 |

I think that's a good approximation, at least informally -- which is presumably why Evans Jadotte suggested that technique earlier in the thread (see below). But I'm distinguishing, as others may not be, between count data and categorical data. The number of Statalist postings in a unit time is a count variable; gender is a categorical variable. Naturally, I'm aware that every count corresponds to a category and vice versa. But there is a variation in what models are most appropriate. Nick n.j.cox@durham.ac.uk Lachenbruch, Peter A short question on PCA for categorical variables: wouldn't correspondence analysis be useful here? Or is my interpretation of CA as the categorical analog of PCA way off base? Nick Cox There are various unstated assumptions and criteria that need to be spelled out for a fruitful discussion. 1. Continuous versus discrete. I don't know any reason why PCA might not be as helpful, or as useless, on discrete data (e.g. counts) as compared with continuous data. I wouldn't think it useful for categorical variables, which I take to be a quite different issue. 2. Skewed versus symmetric. In principle, PCA might work very well even if some of the variables were highly skewed. In practice, skewness quite often goes together with nonlinearities, and a transformation might help in either case. 3. Whether PCA will work well does depend on what you expect it to do ideally, which is not clear in the question. Evans Jadotte <evans.jadotte@uab.es> I think a straightforward way to deal with this issue is to apply a Multiple Correspondence Analysis (MCA) to your data. See Asselin (2002) for an application, and also reference therein. Cameron McIntosh > You should also check out chapters 8 and 9 of: > > Basilevsky, A. (1994). Statistical Factor Analysis and Related Methods: Theory and Applications. New York: Wiley. kokootchke@hotmail.com >> I don't know much about this but a while ago I was looking for something similar and I came across this paper which helped me: >> >> http://cosco.hiit.fi/search/MPCA/buntineDPCA.pdf >> >> If that's not useful to you, it has a bunch of references in the back. Maybe those can help. Jason Ferris >>> As PCA is appropriate for continuous data. I am wondering if it is >>> appropriate for count data (i.e., highly skewed)? Can someone provide >>> advice, guidance or a resource in using PCA with count data? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: Re: st: Principal Components Analysis with count data***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**RE: Re: st: Principal Components Analysis with count data***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

- Prev by Date:
**Re: AW: st: AW: AW: Plotting 3 way continuous interactions in regression** - Next by Date:
**Re: st: Re: equivalence test** - Previous by thread:
**RE: st: Principal Components Analysis with count data** - Next by thread:
**RE: Re: st: Principal Components Analysis with count data** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |