Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: polychoric for huge data sets


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: polychoric for huge data sets
Date   Wed, 5 Sep 2012 09:59:41 +0100

Stas Kolenikov's -polychoric- package promises only principal
component analysis. Depending on how you were brought up, that is
distinct from factor analysis, or a limiting case of factor analysis,
or a subset of factor analysis.

The problem you report as "just can't handle it" with no details
appears to be one of speed, rather than refusal or inability to
perform.

That aside, what is "appropriate" is difficult to answer.  A recent
thread indicated that many on this list are queasy about means or
t-tests for ordinal data, so that would presumably put factor analysis
or PCA of ordinal data beyond the pale. Nevertheless it remains
popular.

You presumably have the option of taking a random sample from your
data and subjecting that to both (a) PCA of _ranked_ data (which is
equivalent to PCA based on Spearman correlation) and (b) polychoric
PCA. Then it would be good news for you if the substantive or
scientific conclusions were the same, and a difference you need to
think about otherwise. Here the random sample should be large enough
to be substantial, but small enough to get results in reasonable time.

Alternatively, you could be ruthless about which of your variables are
most interesting or important. A preliminary correlation analysis
would show which variables could be excluded because they are poorly
correlated with anything else, and which could be excluded because
they are very highly correlated with anything else. Even if you can
get it, a PCA based on 40+ variables is often unwieldy to handle and
even more difficult to interpret than one based on say 10 or so
variables.

Nick

On Wed, Sep 5, 2012 at 3:37 AM, Timea Partos
<Timea.Partos@cancervic.org.au> wrote:

> I need to run a factor analysis on ordinal data.  My dataset is huge (7000+ cases with 40+ variables) so I can't run the polychoric.do program written by Stas Kolenikov, because it just can't handle it.
>
> Does anyone know of a fast way to obtain the polychoric correlation matrix for very large data sets?
>
> Alternatively, I was thinking of running the factor analysis using the Spearman rho (rank-order correlations) matrix instead.  Would this be appropriate?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index