Order Stata
## Principal components

**. webuse auto**
(1978 Automobile Data)
**. pca price mpg rep78 headroom weight length displacement foreign**
Principal components/correlation Number of obs = 69
Number of comp. = 8
Trace = 8
Rotation: (unrotated = principal) Rho = 1.0000

Principal components (eigenvectors)

**. screeplot**

**. screeplot, yline(1) ci(het)**

**. predict pc1 pc2, score**
(6 components skipped)
Scoring coefficients
sum of squares(column-loading) = 1

**. correlate pc1 pc2**
(obs=69)

Stata’s **pca** allows you to estimate parameters of principal-component models.

Component | Eigenvalue Difference Proportion Cumulative | |

Comp1 | 4.7823 3.51481 0.5978 0.5978 | |

Comp2 | 1.2675 .429638 0.1584 0.7562 | |

Comp3 | .837857 .398188 0.1047 0.8610 | |

Comp4 | .439668 .0670301 0.0550 0.9159 | |

Comp5 | .372638 .210794 0.0466 0.9625 | |

Comp6 | .161844 .0521133 0.0202 0.9827 | |

Comp7 | .109731 .081265 0.0137 0.9964 | |

Comp8 | .0284659 . 0.0036 1.0000 | |

Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 | |

price | 0.2324 0.6397 -0.3334 -0.2099 0.4974 -0.2815 | |

mpg | -0.3897 -0.1065 0.0824 0.2568 0.6975 0.5011 | |

rep78 | -0.2368 0.5697 0.3960 0.6256 -0.1650 -0.1928 | |

headroom | 0.2560 -0.0315 0.8439 -0.3750 0.2560 -0.1184 | |

weight | 0.4435 0.0979 -0.0325 0.1792 -0.0296 0.2657 | |

length | 0.4298 0.0687 0.0864 0.1845 -0.2438 0.4144 | |

displacement | 0.4304 0.0851 -0.0445 0.1524 0.1782 0.2907 | |

foreign | -0.3254 0.4820 0.0498 -0.5183 -0.2850 0.5401 | |

Variable | Comp7 Comp8 | Unexplained | ||

price | 0.2165 -0.0891 | 0 | ||

mpg | 0.1625 0.0115 | 0 | ||

rep78 | -0.0813 0.0065 | 0 | ||

headroom | 0.0226 0.0252 | 0 | ||

weight | 0.1104 0.8228 | 0 | ||

length | 0.5437 -0.4921 | 0 | ||

displacement | -0.7733 -0.2608 | 0 | ||

foreign | -0.1173 0.0639 | 0 | ||

We typed **pca price mpg ** ... **foreign**. All Stata commands share
the same syntax: the names of the variables (dependent first and then
independent) follow the command's name, and they are, optionally, followed by
a comma and any options. In this case, we did not specify any options.

Having estimated the principal components, we can at any time type
**pca** by itself to redisplay the principal-component output. We can
also type **screeplot** to obtain a scree plot of the eigenvalues, and we
can use the **predict** command to obtain the components themselves.

**screeplot**, typed by itself, graphs the proportion of variance
explained by each component:

Typing **screeplot, yline(1) ci(het)** adds a line across the y-axis at 1
and adds heteroskedastic bootstrap confidence intervals.

We can obtain the first two components by typing

Variable | Comp1 Comp2 Comp3 Comp4 Comp5 Comp6 | |

price | 0.2324 0.6397 -0.3334 -0.2099 0.4974 -0.2815 | |

mpg | -0.3897 -0.1065 0.0824 0.2568 0.6975 0.5011 | |

rep78 | -0.2368 0.5697 0.3960 0.6256 -0.1650 -0.1928 | |

headroom | 0.2560 -0.0315 0.8439 -0.3750 0.2560 -0.1184 | |

weight | 0.4435 0.0979 -0.0325 0.1792 -0.0296 0.2657 | |

length | 0.4298 0.0687 0.0864 0.1845 -0.2438 0.4144 | |

displacement | 0.4304 0.0851 -0.0445 0.1524 0.1782 0.2907 | |

foreign | -0.3254 0.4820 0.0498 -0.5183 -0.2850 0.5401 | |

Variable | Comp7 Comp8 | |

price | 0.2165 -0.0891 | |

mpg | 0.1625 0.0115 | |

rep78 | -0.0813 0.0065 | |

headroom | 0.0226 0.0252 | |

weight | 0.1104 0.8228 | |

length | 0.5437 -0.4921 | |

displacement | -0.7733 -0.2608 | |

foreign | -0.1173 0.0639 | |

The **score** option tells Stata's **predict** command to compute the
scores of the components, and **pc1** and **pc2** are the names we
have chosen for the two new variables. We could have obtained the first
three factors by typing, for example, **predict pc1 pc2 pc3, score**.

An important feature of Stata is that it does not have modes or modules. We
typed **pca** to estimate the principal components. We then typed
**screeplot** to see a graph of the eigenvalues — we did not have
to save the data and change modules. Similarly, we typed **predict pc1
pc2, score** to obtain the first two components. The new variables,
**pc1** and **pc2**, are now part of our data and are ready for use;
we could now use **regress** to fit a regression model.

The two components should have correlation 0, and we can use the
**correlate** command, which like every other Stata command, is always
available for use. To verify that the correlation between **pc1** and
**pc2** is zero, we type

pc1 pc2 | ||

pc1 | 1.0000 | |

pc2 | 0.0000 1.0000 |

See
**New in Stata 14**
for more about what was added in Stata 14.