Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Principal Components Analysis with count data


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Principal Components Analysis with count data
Date   Thu, 13 Aug 2009 19:15:34 +0100

I am puzzled by this continuation. The reference you give is a sales
pitch for a way to do structural equation modelling -- which personally
I found repellent in tone, but that detail is incidental. I can't see
what it has to do with PCA, except tenuously that some people who use
PCA might also be interested in doing SEM. 

Similarly, no one has mentioned mixed mode before in this thread. So, if
I understand your implication my little list should read 

5. Single mode (all continuous or all discrete) versus mixed mode (some
of each). In principle, mixing continuous and discrete variables is not
necessarily fatal, but in practice it can prove problematic. 

Nick 
n.j.cox@durham.ac.uk 

Cameron McIntosh

Hi Nick et al.,
I agree that the decision to use PCA (and any other statistical
procedure) rests on the question you want to answer. Anyway, those
wondering about the merits of PCA (and other exploratory procedures)
with mixed mode observed variables might be interested in this:
http://www.neusrel.com/you_have_issues.html
True, it's not implemented in Stata (yet), but SEMNET has been buzzing a
bit about this lately and I thought I'd share.

Nick Cox 

> There are various unstated assumptions and criteria that need to be
> spelled out for a fruitful discussion.
>
> 1. Continuous versus discrete. I don't know any reason why PCA might
not
> be as helpful, or as useless, on discrete data (e.g. counts) as
compared
> with continuous data. I wouldn't think it useful for categorical
> variables, which I take to be a quite different issue.
>
> 2. Skewed versus symmetric. In principle, PCA might work very well
even
> if some of the variables were highly skewed. In practice, skewness
quite
> often goes together with nonlinearities, and a transformation might
help
> in either case.
>
> 3. Whether PCA will work well does depend on what you expect it to do
> ideally, which is not clear in the question.

Evans Jadotte 

> I think a straightforward way to deal with this issue is to apply a
> Multiple Correspondence Analysis (MCA) to your data. See Asselin
(2002)
> for an application, and also reference therein.

Cameron McIntosh

>> You should also check out chapters 8 and 9 of:
>>
>> Basilevsky, A. (1994). Statistical Factor Analysis and Related
> Methods: Theory and Applications. New York: Wiley.

kokootchke@hotmail.com

>>> I don't know much about this but a while ago I was looking for
> something similar and I came across this paper which helped me:
>>>
>>> http://cosco.hiit.fi/search/MPCA/buntineDPCA.pdf
>>>
>>> If that's not useful to you, it has a bunch of references in the
> back. Maybe those can help.

Jason Ferris

>>>> As PCA is appropriate for continuous data. I am wondering if it is
>>>> appropriate for count data (i.e., highly skewed)? Can someone
> provide
>>>> advice, guidance or a resource in using PCA with count data?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index