Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: polychoric for huge data sets |

Date |
Wed, 5 Sep 2012 09:59:41 +0100 |

Stas Kolenikov's -polychoric- package promises only principal component analysis. Depending on how you were brought up, that is distinct from factor analysis, or a limiting case of factor analysis, or a subset of factor analysis. The problem you report as "just can't handle it" with no details appears to be one of speed, rather than refusal or inability to perform. That aside, what is "appropriate" is difficult to answer. A recent thread indicated that many on this list are queasy about means or t-tests for ordinal data, so that would presumably put factor analysis or PCA of ordinal data beyond the pale. Nevertheless it remains popular. You presumably have the option of taking a random sample from your data and subjecting that to both (a) PCA of _ranked_ data (which is equivalent to PCA based on Spearman correlation) and (b) polychoric PCA. Then it would be good news for you if the substantive or scientific conclusions were the same, and a difference you need to think about otherwise. Here the random sample should be large enough to be substantial, but small enough to get results in reasonable time. Alternatively, you could be ruthless about which of your variables are most interesting or important. A preliminary correlation analysis would show which variables could be excluded because they are poorly correlated with anything else, and which could be excluded because they are very highly correlated with anything else. Even if you can get it, a PCA based on 40+ variables is often unwieldy to handle and even more difficult to interpret than one based on say 10 or so variables. Nick On Wed, Sep 5, 2012 at 3:37 AM, Timea Partos <Timea.Partos@cancervic.org.au> wrote: > I need to run a factor analysis on ordinal data. My dataset is huge (7000+ cases with 40+ variables) so I can't run the polychoric.do program written by Stas Kolenikov, because it just can't handle it. > > Does anyone know of a fast way to obtain the polychoric correlation matrix for very large data sets? > > Alternatively, I was thinking of running the factor analysis using the Spearman rho (rank-order correlations) matrix instead. Would this be appropriate? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: polychoric for huge data sets***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: polychoric for huge data sets***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: polychoric for huge data sets***From:*Timea Partos <Timea.Partos@cancervic.org.au>

- Prev by Date:
**Re: st: Dropping Program Saved as an ado-file** - Next by Date:
**Re: st: Dropping Program Saved as an ado-file** - Previous by thread:
**st: polychoric for huge data sets** - Next by thread:
**Re: st: polychoric for huge data sets** - Index(es):