Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: principal component analysis-creating linear combinations


From   James Wu <[email protected]>
To   [email protected]
Subject   Re: st: principal component analysis-creating linear combinations
Date   Thu, 10 Mar 2011 10:03:04 -0500

Thank you, but "-predict-" generates only the first component scores.

(1) By the way, would it be wrong to construct the linear combinations
as I described earlier?
such as, Y1=0.3894*x1+0.4517*x2+0.5733*x3+0.5619*x4.

Here is the comparison:

. predict pc1
(ommission the output)

. sum pc1

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
         pc1 |       659    2.97e-09    1.558505   -3.00555   6.801751


. gen Y1=0.3894*x1+0.4517*x2+0.5733*x3+0.5619*x4

. pwcorr  pc1 Y1

             |      pc1       Y1
-------------+------------------
         pc1 |   1.0000
          Y1 |   0.9724   1.0000


(2) As one can see from the original PCA, the second component have
positive signs on x1 and x2.
So I want to create the second component scores.

How can I obtain (if I do not create it by
Y2=0.8726*x1+0.0966*x2-0.3179*x3-0.3580*x4)?

Thank you very much again,

James





On Thu, Mar 10, 2011 at 9:46 AM, Maarten buis <[email protected]> wrote:
> --- On Thu, 10/3/11, James Wu wrote:
>> Suppose we ran pca on four variables, x1, x2, x3, x4 as
>> follows:
>> Now, suppose that you decide to retain the firs two
>> principal components, and then you want to create two
>> variables that are linear combinations of the original
>> four variables.
>
> Then you need to use -predict-, see help -pca postestimation-.
>
> Hope this helps,
> Maarten
>
> --------------------------
> Maarten L. Buis
> Institut fuer Soziologie
> Universitaet Tuebingen
> Wilhelmstrasse 36
> 72074 Tuebingen
> Germany
>
> http://www.maartenbuis.nl
> --------------------------
>
>


Hi, I am hoping to get some answer to my questions below:

Suppose we ran pca on four variables, x1, x2, x3, x4 as follows:

. pca  x1 x2 x3 x4, components (3)

Principal components/correlation                  Number of obs    =       659
Number of comp.  =         3
Trace            =         4
Rotation: (unrotated = principal)             Rho              =    0.9550
--------------------------------------------------------------------------
Component    Eigenvalue   Difference         Proportion   Cumulative
-------------+------------------------------------------------------------
Comp1       2.42894      1.67142             0.6072       0.6072
Comp2       .757515      .124084             0.1894       0.7966
Comp3       .633431      .453314             0.1584       0.9550
Comp4       .180117            .             0.0450       1.0000
--------------------------------------------------------------------------
Principal components (eigenvectors)
----------------------------------------------------------
Variable     Comp1     Comp2     Comp3  Unexplained
-------------+------------------------------+-------------
x1    0.3894    0.8726   -0.2945    .00004265
x2    0.4517    0.0966    0.8858     .0003491
x3    0.5733   -0.3179   -0.2218       .09384
x4    0.5619   -0.3580   -0.2817       .08588
----------------------------------------------------------


Now, suppose that you decide to retain the firs two principal
components, and then you want to create two variables that are linear
combinations of the original four variables.

Question1:  Would it be simply to create by multiply the Principal
Components (eigenvectors, columns)  with the orginal variables, say,
Y1=0.3894*x1+0.4517*x2+0.5733*x3+0.5619*x4 and
Y2=0.8726*x1+0.0966*x2-0.3179*x3-0.3580*x4?

Question 2: Assuming that I am correct in creating new variables by
simply multiplying the Principal components (eigenvectors) with the
orginal variables (Question 1),
if these four original variables are in different units of
measurement, then should we standardize the original four variables
(so that each of standardized original variable has mean 0 and std of
1) before computing the multiproducts as in my Question 1?

Thank you so much in advance.

James
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index