Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: polychoric for huge data sets

From   Nick Cox <>
Subject   Re: st: polychoric for huge data sets
Date   Wed, 5 Sep 2012 14:54:55 +0100

Experiment supports intuition in suggesting that the number of
variables is a bigger deal for -polychoric- than the number of
observations, and also that you can get results for 8000 obs and 40
variables in several minutes on a mundane computer. That's tedious
interactively but  doesn't support the claim that Timea made. Best
just to write a do-file and let it run while you are doing something


On Wed, Sep 5, 2012 at 9:59 AM, Nick Cox <> wrote:
> Stas Kolenikov's -polychoric- package promises only principal
> component analysis. Depending on how you were brought up, that is
> distinct from factor analysis, or a limiting case of factor analysis,
> or a subset of factor analysis.
> The problem you report as "just can't handle it" with no details
> appears to be one of speed, rather than refusal or inability to
> perform.
> That aside, what is "appropriate" is difficult to answer.  A recent
> thread indicated that many on this list are queasy about means or
> t-tests for ordinal data, so that would presumably put factor analysis
> or PCA of ordinal data beyond the pale. Nevertheless it remains
> popular.
> You presumably have the option of taking a random sample from your
> data and subjecting that to both (a) PCA of _ranked_ data (which is
> equivalent to PCA based on Spearman correlation) and (b) polychoric
> PCA. Then it would be good news for you if the substantive or
> scientific conclusions were the same, and a difference you need to
> think about otherwise. Here the random sample should be large enough
> to be substantial, but small enough to get results in reasonable time.
> Alternatively, you could be ruthless about which of your variables are
> most interesting or important. A preliminary correlation analysis
> would show which variables could be excluded because they are poorly
> correlated with anything else, and which could be excluded because
> they are very highly correlated with anything else. Even if you can
> get it, a PCA based on 40+ variables is often unwieldy to handle and
> even more difficult to interpret than one based on say 10 or so
> variables.
> Nick
> On Wed, Sep 5, 2012 at 3:37 AM, Timea Partos
> <> wrote:
>> I need to run a factor analysis on ordinal data.  My dataset is huge (7000+ cases with 40+ variables) so I can't run the program written by Stas Kolenikov, because it just can't handle it.
>> Does anyone know of a fast way to obtain the polychoric correlation matrix for very large data sets?
>> Alternatively, I was thinking of running the factor analysis using the Spearman rho (rank-order correlations) matrix instead.  Would this be appropriate?
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index