Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: wealth score using principal component analysis (PCA) |

Date |
Wed, 26 Sep 2012 00:34:03 +0100 |

If you want just one index, you can't improve on the first PC if you are using the criteria of PCA. That's a central idea of PCA. Nick On Wed, Sep 26, 2012 at 12:22 AM, Shikha Sinha <shikha.sinha414@gmail.com> wrote: > Thanks for your response Nick and stat! > > I think I am struggling with how to create one scores from two > components. Let me pose my question again. > > Suppose I want to create one index out of six variables. For example, > I want to create a "women autonomy index". The index would be one > number for every households. The Demographic and health survey (DHS) > ask 10 different questions related to women autonomy and instead of > using the information in all the 10 questions, I just want to use an > index that contains the summary information of all the 10 > questions/variables. I can use -pca to create the index. Once I use > -pca x1-x10, I can choose number of principal components (pc) to > retain based on eigenvalues or screeplot. Let assume that there are > three pc that have eigenvalues > 1 and I want to retain all these > components, though the first component has the highest variation. > > Now, I want to create a "women autonomy index" based on these three > pc. How can I do that? If I use -predict p1 p2 p3, scores; it gives > three different scores, all unrelated. However, I want just one index, > kindly suggest how to do this. > > Thanks, > Shikha > > > > On Tue, Sep 25, 2012 at 9:05 AM, Stas Kolenikov <skolenik@gmail.com> wrote: >> Regarding (c), you would be best off with a structural equations model >> (-sem- module), and forgo the PCA whatsoever. >> >> -- >> -- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name >> -- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at >> srbi dot com >> -- Opinions stated in this email are mine only, and do not reflect the >> position of my employer >> >> >> >> On Mon, Sep 24, 2012 at 7:07 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> You seem to be misunderstanding both PCA and the syntax of -predict- >>> after -pca-. >>> >>> To take the second first, -predict- just gives you as many components >>> as you ask for. Ask for one by giving one variable name and you get >>> scores for the first PC, regardless of what name you give. Stata's >>> indifferent to what name you give (so long as it is new and legal) and >>> indeed >>> >>> predict p3 >>> predict p777 >>> >>> would give you further identical copies of the first PC. >>> >>> predict P1 P2 >>> >>> would give you scores for the first two PCs. >>> >>> As for PCA there are potentially as many PCs as variables: although >>> the -components()- option puts a self-defined limit on how many you >>> can calculate the main purpose of this option appears to be to let >>> -pca- behave more like -factor-. >>> >>> Even if your purpose is to use just one PC, it usually makes sense to >>> look at several and the relationships of those PCs to your original >>> variables. Sometimes the second, third, ... PC pick up important parts >>> of the variation and it is a good idea to look at those too to see >>> what the first PC is missing. In the case of wealth variables it might >>> be a good idea to think about using PCA on logarithmic transformations >>> of the variables too (assuming all values are strictly positive). >>> >>> Note that the audience of Statalist is very international and >>> interdisciplinary, so that assuming that "DHS" is self-evident is >>> likely to be wrong in many cases. >>> >>> Your last question (c) is unanswerable. Many people do it, but how far >>> it is "OK" in your project depends on your goals and your data, which >>> we can't see. >>> >>> Nick >>> >>> On Mon, Sep 24, 2012 at 9:20 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote: >>> >>>> I am trying to create a wealth score using the ownership of different >>>> assets in the DHS survey. I am suing -pca but I am not sure how to >>>> estimate the score as I want to use the wealth score as one of the >>>> independent variables. >>>> >>>> pca x1-x4 >>>> predict p1,score >>>> >>>> but -predict only generates score from first component. >>>> >>>> I also tried the following, >>>> >>>> -pca x1-x4, components (2) >>>> predict p2, score >>>> >>>> However, p1 and p2 are same. >>>> >>>> My questions are, (a) why there is no difference between p1 and p2? >>>> (b) How can I generate score by using first 2 components only? >>>> (c) Is it ok to use continuous pca score as an independent variable? >>>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: wealth score using principal component analysis (PCA)***From:*Stas Kolenikov <skolenik@gmail.com>

**References**:**st: wealth score using principal component analysis (PCA)***From:*Shikha Sinha <shikha.sinha414@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: wealth score using principal component analysis (PCA)***From:*Shikha Sinha <shikha.sinha414@gmail.com>

- Prev by Date:
**Re: st: wealth score using principal component analysis (PCA)** - Next by Date:
**st: Equation for unobserved components model** - Previous by thread:
**Re: st: wealth score using principal component analysis (PCA)** - Next by thread:
**Re: st: wealth score using principal component analysis (PCA)** - Index(es):