Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: wealth score using principal component analysis (PCA) |

Date |
Thu, 27 Sep 2012 09:11:20 -0500 |

The way I would put this would be, "in almost any when you know how you are going to use the components later". PCA is a fabulous technique to look at the data somebody just brought in to you (although I would probably use -biplot- rather than the straight -pca- to get a better look at the data). Once you know that your wealth scores or whatever have you will go into a regression model, you are better off with -sem- for a number of reasons: (i) it controls the unavoidable measurement error, (ii) it lets your data speak to the model rather than to one another -- the combination that maximizes the variance may not necessarily be the combination that best works in the regression model, (iii) you get a goodness of fit test to see whether the whole argument is making sense. There are, of course, exceptions, like when you want to run a principal components regression for multicollinearity control/shrinkage purposes, and/or you want to stop at the principal components and just present the plot of these, but I believe that for most social science applications, a move from PCA to SEM is more naturally expected than the one in the opposite direction. I did publish a paper on how to do PCA with the asset data (http://dx.doi.org/10.1111/j.1475-4991.2008.00309.x), which is what you have in say DHS, but my intent was to say, "If you are determined on doing the PCA, this is how you would want to do this". And I gave references to better approaches, including the fabulous explanation by Bollen, Steklov and Glanville on how to build SEM models incorporating asset variables into the latent socio-economic status variable. -- -- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name -- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at srbi dot com -- Opinions stated in this email are mine only, and do not reflect the position of my employer On Thu, Sep 27, 2012 at 3:26 AM, Nick Cox <njcoxstata@gmail.com> wrote: > I can't give an answer to this question that is likely to satisfy you. > PCA and SEM are very different methods. PCA is in my view primarily a > multivariate transformation technique. SEM is, more obviously, a > family of modelling techniques. Even in this thread the use of PCA > appears to be part of a wider model-based strategy and that is likely > to be typical of most projects in which it appears. I don't think "use > PCA" is ever likely to be the core of the answer to "what should I > do?" but "use SEM" might be, sometimes. > > Stas [sic] can speak for himself, but I suspect his position would be > close to mine on this. > > Nick > > On Thu, Sep 27, 2012 at 8:06 AM, 汪哲仁 <chejen.wang@gmail.com> wrote: > >> Dear Nick and Stat, >> >> May I ask a question? In which circumstances, the PCA is a better >> choice than SEM? > > 2012/9/27 Nick Cox <njcoxstata@gmail.com> > >>> You are confusing two different questions. Throughout I focus on the >>> case you are looking at where PCA is based on the correlation matrix. >>> >>> If the aim is to use the most important PC, then that is labelled 1, >>> but even if it weren't we could identify it by its having the largest >>> eigenvalue attached and no extra considerations arise. >>> >>> If the aim is to identify which PCs are "important" or "worthy of use" >>> (typically one or more) and should be used in later analyses, then >>> this is necessarily a looser, more open question and the best art is a >>> darker matter. There can't be an answer independent of what you are >>> trying to do. Some people do stress a rule of thumb such as >>> eigenvalues > 1 and some people look for a break in the eigenvalues >>> using a scree plot. In some projects PCs that are used later are good >>> if interpretable as having high correlations with particular >>> variables; in other projects the PCs are just composite variables with >>> the properties assigned to them and interpretability is less material. >>> >>> Every book I know on PCA stresses this open aspect of the method. The >>> books by Jolliffe and Jackson referenced in the -pca- documentation >>> certainly do. >>> >>> It's not clear exactly why you feel committed in advance to using PCA >>> like this. I sympathise with the advice given earlier by Stas >>> Kolenikov to consider something more like an SEM. >>> >>> Nick >>> >>> On Wed, Sep 26, 2012 at 9:33 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote: >>> > Ok, I got it now that if I want to use one score, then PC1 is the most >>> > relevant one, and then for further distinction between financial vs >>> > social, we need to look at factor loadings in each PC2, PC3 , to >>> > figure out if PC2 is better than PC1 if the focus is on social or >>> > financial autonomy. >>> > >>> > Then I am struggling to understand the use of selecting components >>> > based on eigenvalues. What is the use of selecting PC based either on >>> > eigenvalues or screeplot, if we are always (most of the time) going to >>> > use the 1st component. An example on the importance of eigenvalues in >>> > selecting components would be very helpful ( or any ref.) > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: wealth score using principal component analysis (PCA)***From:*Shikha Sinha <shikha.sinha414@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Shikha Sinha <shikha.sinha414@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Shikha Sinha <shikha.sinha414@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*汪哲仁 <chejen.wang@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**st: sampling weight** - Next by Date:
**st: Draw a random sample of my data...** - Previous by thread:
**Re: st: wealth score using principal component analysis (PCA)** - Next by thread:
**st: Test equality of predictors after logistic regression** - Index(es):