Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: wealth score using principal component analysis (PCA)


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: wealth score using principal component analysis (PCA)
Date   Wed, 26 Sep 2012 00:34:03 +0100

If you want just one index, you can't improve on the first PC if you
are using the criteria of PCA. That's a central idea of PCA.

Nick

On Wed, Sep 26, 2012 at 12:22 AM, Shikha Sinha
<shikha.sinha414@gmail.com> wrote:
> Thanks for your response Nick and stat!
>
> I think I am struggling with how to create one scores from two
> components. Let me pose my question again.
>
> Suppose I want to create one index out of six variables. For example,
> I want to create a  "women autonomy index". The index would be one
> number for every households. The Demographic and health survey (DHS)
> ask 10 different questions related to women autonomy and instead of
> using the information in all the 10 questions, I just want to use an
> index that contains the summary information of all the 10
> questions/variables. I can use -pca to create the index. Once I use
> -pca x1-x10, I can choose number of principal components (pc) to
> retain based on eigenvalues or screeplot. Let assume that there are
> three pc that have eigenvalues > 1 and I want to retain all these
> components, though the first component has the highest variation.
>
> Now, I want to create a "women autonomy index" based on these three
> pc. How can I do that? If I use -predict p1 p2 p3, scores; it gives
> three different scores, all unrelated. However, I want just one index,
> kindly suggest how to do this.
>
> Thanks,
> Shikha
>
>
>
> On Tue, Sep 25, 2012 at 9:05 AM, Stas Kolenikov <skolenik@gmail.com> wrote:
>> Regarding (c), you would be best off with a structural equations model
>> (-sem- module), and forgo the PCA whatsoever.
>>
>> --
>> -- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
>> -- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
>> srbi dot com
>> -- Opinions stated in this email are mine only, and do not reflect the
>> position of my employer
>>
>>
>>
>> On Mon, Sep 24, 2012 at 7:07 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> You seem to be misunderstanding both PCA and the syntax of -predict-
>>> after -pca-.
>>>
>>> To take the second first, -predict- just gives you as many components
>>> as you ask for. Ask for one by giving one variable name and you get
>>> scores for the first PC, regardless of what name you give. Stata's
>>> indifferent to what name you give (so long as it is new and legal) and
>>> indeed
>>>
>>> predict p3
>>> predict p777
>>>
>>> would give you further identical copies of the first PC.
>>>
>>> predict P1 P2
>>>
>>> would give you scores for the first two PCs.
>>>
>>> As for PCA there are potentially as many PCs as variables: although
>>> the -components()- option puts a self-defined limit on how many you
>>> can calculate the main purpose of this option appears to be to let
>>> -pca- behave more like -factor-.
>>>
>>> Even if your purpose is to use just one PC, it usually makes sense to
>>> look at several and the relationships of those PCs to your original
>>> variables. Sometimes the second, third, ... PC pick up important parts
>>> of the variation and it is a good idea to look at those too to see
>>> what the first PC is missing. In the case of wealth variables it might
>>> be a good idea to think about using PCA on logarithmic transformations
>>> of the variables too (assuming all values are strictly positive).
>>>
>>> Note that the audience of Statalist is very international and
>>> interdisciplinary, so that assuming that "DHS" is self-evident is
>>> likely to be wrong in many cases.
>>>
>>> Your last question (c) is unanswerable. Many people do it, but how far
>>> it is "OK" in your project depends on your goals and your data, which
>>> we can't see.
>>>
>>> Nick
>>>
>>> On Mon, Sep 24, 2012 at 9:20 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote:
>>>
>>>> I am trying to create a wealth score using the ownership of different
>>>> assets in the DHS survey.  I am suing -pca but I am not sure how to
>>>> estimate the score as I want to use the wealth score as one of the
>>>> independent variables.
>>>>
>>>> pca x1-x4
>>>> predict p1,score
>>>>
>>>> but -predict only generates score from first component.
>>>>
>>>> I also tried the following,
>>>>
>>>> -pca x1-x4, components (2)
>>>> predict p2, score
>>>>
>>>> However, p1 and p2 are same.
>>>>
>>>> My questions are, (a) why there is no difference between p1 and p2?
>>>> (b) How can I generate score by using first 2 components only?
>>>> (c) Is it ok to use continuous pca score as an independent variable?
>>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index