Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: principal component analysis-creating linear combinations


From   James Wu <[email protected]>
To   [email protected]
Subject   Re: st: RE: principal component analysis-creating linear combinations
Date   Thu, 10 Mar 2011 10:26:16 -0500

Nick and Maarten,

Now it works. Thanks a lot for the tips.

James

On Thu, Mar 10, 2011 at 10:12 AM, Nick Cox <[email protected]> wrote:
> Not so: There is an explicit example for exactly your need in the help.
>
> Individual scores for the components are obtained via predict
>        . predict f1
>        . predict f1 f2
>
> That is, for 2, 3, ... components, specify as many names as you need.
>
> I am looking at Stata 11 documentation; if you are using an earlier version, you should state that as requested in the Statalist FAQ.
>
> Nick
> [email protected]
>
> James Wu [mailto:[email protected]]
>
> Nick, thank you very much.
>
> But how can I obtain the second component scores (that would
> correspond to Y2 that I called earlier) by using predict?
> I read the manual on pca postestimation, but there is no indication on
> it (only the first component scores).
>
> On Thu, Mar 10, 2011 at 9:56 AM, Nick Cox <[email protected]> wrote:
>
>> The easiest and best way to create the principal components themselves is use -predict- after -pca-. There is no need for you to do the calculation by typing out coefficients in a linear equation. That is even at best problematic in terms of keeping precision.
>>
>> The default of -pca- is to use the correlation matrix; that is entirely equivalent to using standardised variables, so that there is absolutely no need to standardise yourself, except possibly as an exercise.
>>
>> I wouldn't call the eigenvectors the PCs myself, although there are varying habits on this.
>
> James Wu
>
>> Suppose we ran pca on four variables, x1, x2, x3, x4 as follows:
>
>> . pca  x1 x2 x3 x4, components (3)
>>
>> Principal components/correlation                  Number of obs    =       659
>> Number of comp.  =         3
>> Trace            =         4
>> Rotation: (unrotated = principal)             Rho              =    0.9550
>> --------------------------------------------------------------------------
>> Component    Eigenvalue   Difference         Proportion   Cumulative
>> -------------+------------------------------------------------------------
>> Comp1       2.42894      1.67142             0.6072       0.6072
>> Comp2       .757515      .124084             0.1894       0.7966
>> Comp3       .633431      .453314             0.1584       0.9550
>> Comp4       .180117            .             0.0450       1.0000
>> --------------------------------------------------------------------------
>> Principal components (eigenvectors)
>> ----------------------------------------------------------
>> Variable     Comp1     Comp2     Comp3  Unexplained
>> -------------+------------------------------+-------------
>> x1    0.3894    0.8726   -0.2945    .00004265
>> x2    0.4517    0.0966    0.8858     .0003491
>> x3    0.5733   -0.3179   -0.2218       .09384
>> x4    0.5619   -0.3580   -0.2817       .08588
>> ----------------------------------------------------------
>>
>>
>> Now, suppose that you decide to retain the firs two principal
>> components, and then you want to create two variables that are linear
>> combinations of the original four variables.
>>
>> Question1:  Would it be simply to create by multiply the Principal
>> Components (eigenvectors, columns)  with the orginal variables, say,
>> Y1=0.3894*x1+0.4517*x2+0.5733*x3+0.5619*x4 and
>> Y2=0.8726*x1+0.0966*x2-0.3179*x3-0.3580*x4?
>>
>> Question 2: Assuming that I am correct in creating new variables by
>> simply multiplying the Principal components (eigenvectors) with the
>> orginal variables (Question 1),
>> if these four original variables are in different units of
>> measurement, then should we standardize the original four variables
>> (so that each of standardized original variable has mean 0 and std of
>> 1) before computing the multiproducts as in my Question 1?
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index