Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: wealth score using principal component analysis (PCA)


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: wealth score using principal component analysis (PCA)
Date   Thu, 27 Sep 2012 09:11:20 -0500

The way I would put this would be, "in almost any when you know how
you are going to use the components later". PCA is a fabulous
technique to look at the data somebody just brought in to you
(although I would probably use -biplot- rather than the straight -pca-
to get a better look at the data). Once you know that your wealth
scores or whatever have you will go into a regression model, you are
better off with -sem- for a number of reasons: (i) it controls the
unavoidable measurement error, (ii) it lets your data speak to the
model rather than to one another -- the combination that maximizes the
variance may not necessarily be the combination that best works in the
regression model, (iii) you get a goodness of fit test to see whether
the whole argument is making sense. There are, of course, exceptions,
like when you want to run a principal components regression for
multicollinearity control/shrinkage purposes, and/or you want to stop
at the principal components and just present the plot of these, but I
believe that for most social science applications, a move from PCA to
SEM is more naturally expected than the one in the opposite direction.

I did publish a paper on how to do PCA with the asset data
(http://dx.doi.org/10.1111/j.1475-4991.2008.00309.x), which is what
you have in say DHS, but my intent was to say, "If you are determined
on doing the PCA, this is how you would want to do this". And I gave
references to better approaches, including the fabulous explanation by
Bollen, Steklov and Glanville on how to build SEM models incorporating
asset variables into the latent socio-economic status variable.

-- 
-- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
-- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
srbi dot com
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer

On Thu, Sep 27, 2012 at 3:26 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> I can't give an answer to this question that is likely to satisfy you.
> PCA and SEM are very different methods. PCA is in my view primarily a
> multivariate transformation technique. SEM is, more obviously, a
> family of modelling techniques. Even in this thread the use of PCA
> appears to be part of a wider model-based strategy and that is likely
> to be typical of most projects in which it appears. I don't think "use
> PCA" is ever likely to be the core of the answer to "what should I
> do?" but "use SEM" might be, sometimes.
>
> Stas [sic] can speak for himself, but I suspect his position would be
> close to mine on this.
>
> Nick
>
> On Thu, Sep 27, 2012 at 8:06 AM, 汪哲仁 <chejen.wang@gmail.com> wrote:
>
>> Dear Nick and Stat,
>>
>> May I ask a question? In which circumstances, the PCA is a better
>> choice than SEM?
>
>  2012/9/27 Nick Cox <njcoxstata@gmail.com>
>
>>> You are confusing two different questions. Throughout I focus on the
>>> case you are looking at where PCA is based on the correlation matrix.
>>>
>>> If the aim is to use the most important PC, then that is labelled 1,
>>> but even if it weren't we could identify it by its having the largest
>>> eigenvalue attached and no extra considerations arise.
>>>
>>> If the aim is to identify which PCs are "important" or "worthy of use"
>>> (typically one or more) and should be used in later analyses, then
>>> this is necessarily a looser, more open question and the best art is a
>>> darker matter. There can't be an answer independent of what you are
>>> trying to do. Some people do stress a rule of thumb such as
>>> eigenvalues > 1 and some people look for a break in the eigenvalues
>>> using a scree plot. In some projects PCs that are used later are good
>>> if interpretable as having high correlations with particular
>>> variables; in other projects the PCs are just composite variables with
>>> the properties assigned to them and interpretability is less material.
>>>
>>> Every book I know on PCA stresses this open aspect of the method. The
>>> books by Jolliffe and Jackson referenced in the -pca- documentation
>>> certainly do.
>>>
>>> It's not clear exactly why you feel committed in advance to using PCA
>>> like this. I sympathise with the advice given earlier by Stas
>>> Kolenikov to consider something more like an SEM.
>>>
>>> Nick
>>>
>>> On Wed, Sep 26, 2012 at 9:33 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote:
>>> > Ok, I got it now that if I want to use one score, then PC1 is the most
>>> > relevant one, and then for further distinction between financial vs
>>> > social, we need to look at factor loadings in each PC2, PC3 , to
>>> > figure out if PC2 is better than PC1 if the focus is on social or
>>> > financial autonomy.
>>> >
>>> > Then I am struggling to understand the use of selecting components
>>> > based on eigenvalues. What is the use of selecting PC based either on
>>> > eigenvalues or screeplot, if we are always (most of the time) going to
>>> > use the 1st component. An example on the importance of eigenvalues in
>>> > selecting components would be very helpful ( or any ref.)
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index