Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Principal Components Analysis with count data


From   Cameron McIntosh <cnm100@hotmail.com>
To   STATA LIST <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Principal Components Analysis with count data
Date   Thu, 13 Aug 2009 15:25:49 -0400

Yes, the sales pitch was fairly nauseating, wasn't it? However, from what I gather, the tool being sold is scientifically and statistically sound. 
 
The reason I gave the reference is because the method described creates principal components from observed variables (with a wide variety of response types), and goes beyond that to do a speification search on possible pathways among the components. I thought those doing standard PCA might be interested in this innovation.      

As for problems with mixing discrete and continuous variables, -gllamm- (and Mplus) seems to be able to handle this pretty well with numerical integration, albeit slowly in many cases.
 
Cam
----------------------------------------
> Subject: RE: st: Principal Components Analysis with count data
> Date: Thu, 13 Aug 2009 19:15:34 +0100
> From: n.j.cox@durham.ac.uk
> To: statalist@hsphsun2.harvard.edu
>
> I am puzzled by this continuation. The reference you give is a sales
> pitch for a way to do structural equation modelling -- which personally
> I found repellent in tone, but that detail is incidental. I can't see
> what it has to do with PCA, except tenuously that some people who use
> PCA might also be interested in doing SEM.
>
> Similarly, no one has mentioned mixed mode before in this thread. So, if
> I understand your implication my little list should read
>
> 5. Single mode (all continuous or all discrete) versus mixed mode (some
> of each). In principle, mixing continuous and discrete variables is not
> necessarily fatal, but in practice it can prove problematic.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Cameron McIntosh
>
> Hi Nick et al.,
> I agree that the decision to use PCA (and any other statistical
> procedure) rests on the question you want to answer. Anyway, those
> wondering about the merits of PCA (and other exploratory procedures)
> with mixed mode observed variables might be interested in this:
> http://www.neusrel.com/you_have_issues.html
> True, it's not implemented in Stata (yet), but SEMNET has been buzzing a
> bit about this lately and I thought I'd share.
>
> Nick Cox
>
>> There are various unstated assumptions and criteria that need to be
>> spelled out for a fruitful discussion.
>>
>> 1. Continuous versus discrete. I don't know any reason why PCA might
> not
>> be as helpful, or as useless, on discrete data (e.g. counts) as
> compared
>> with continuous data. I wouldn't think it useful for categorical
>> variables, which I take to be a quite different issue.
>>
>> 2. Skewed versus symmetric. In principle, PCA might work very well
> even
>> if some of the variables were highly skewed. In practice, skewness
> quite
>> often goes together with nonlinearities, and a transformation might
> help
>> in either case.
>>
>> 3. Whether PCA will work well does depend on what you expect it to do
>> ideally, which is not clear in the question.
>
> Evans Jadotte
>
>> I think a straightforward way to deal with this issue is to apply a
>> Multiple Correspondence Analysis (MCA) to your data. See Asselin
> (2002)
>> for an application, and also reference therein.
>
> Cameron McIntosh
>
>>> You should also check out chapters 8 and 9 of:
>>>
>>> Basilevsky, A. (1994). Statistical Factor Analysis and Related
>> Methods: Theory and Applications. New York: Wiley.
>
> kokootchke@hotmail.com
>
>>>> I don't know much about this but a while ago I was looking for
>> something similar and I came across this paper which helped me:
>>>>
>>>> http://cosco.hiit.fi/search/MPCA/buntineDPCA.pdf
>>>>
>>>> If that's not useful to you, it has a bunch of references in the
>> back. Maybe those can help.
>
> Jason Ferris
>
>>>>> As PCA is appropriate for continuous data. I am wondering if it is
>>>>> appropriate for count data (i.e., highly skewed)? Can someone
>> provide
>>>>> advice, guidance or a resource in using PCA with count data?
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/
_________________________________________________________________
Attention all humans. We are your photos. Free us.
http://go.microsoft.com/?linkid=9666047
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index