Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: wealth score using principal component analysis (PCA)


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: wealth score using principal component analysis (PCA)
Date   Tue, 25 Sep 2012 11:05:04 -0500

Regarding (c), you would be best off with a structural equations model
(-sem- module), and forgo the PCA whatsoever.

-- 
-- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
-- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
srbi dot com
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer



On Mon, Sep 24, 2012 at 7:07 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> You seem to be misunderstanding both PCA and the syntax of -predict-
> after -pca-.
>
> To take the second first, -predict- just gives you as many components
> as you ask for. Ask for one by giving one variable name and you get
> scores for the first PC, regardless of what name you give. Stata's
> indifferent to what name you give (so long as it is new and legal) and
> indeed
>
> predict p3
> predict p777
>
> would give you further identical copies of the first PC.
>
> predict P1 P2
>
> would give you scores for the first two PCs.
>
> As for PCA there are potentially as many PCs as variables: although
> the -components()- option puts a self-defined limit on how many you
> can calculate the main purpose of this option appears to be to let
> -pca- behave more like -factor-.
>
> Even if your purpose is to use just one PC, it usually makes sense to
> look at several and the relationships of those PCs to your original
> variables. Sometimes the second, third, ... PC pick up important parts
> of the variation and it is a good idea to look at those too to see
> what the first PC is missing. In the case of wealth variables it might
> be a good idea to think about using PCA on logarithmic transformations
> of the variables too (assuming all values are strictly positive).
>
> Note that the audience of Statalist is very international and
> interdisciplinary, so that assuming that "DHS" is self-evident is
> likely to be wrong in many cases.
>
> Your last question (c) is unanswerable. Many people do it, but how far
> it is "OK" in your project depends on your goals and your data, which
> we can't see.
>
> Nick
>
> On Mon, Sep 24, 2012 at 9:20 PM, Shikha Sinha <shikha.sinha414@gmail.com> wrote:
>
>> I am trying to create a wealth score using the ownership of different
>> assets in the DHS survey.  I am suing -pca but I am not sure how to
>> estimate the score as I want to use the wealth score as one of the
>> independent variables.
>>
>> pca x1-x4
>> predict p1,score
>>
>> but -predict only generates score from first component.
>>
>> I also tried the following,
>>
>> -pca x1-x4, components (2)
>> predict p2, score
>>
>> However, p1 and p2 are same.
>>
>> My questions are, (a) why there is no difference between p1 and p2?
>> (b) How can I generate score by using first 2 components only?
>> (c) Is it ok to use continuous pca score as an independent variable?
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index