Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: -factor pcf- vs -pca- (was factor score postestimation)

From   "Nick Cox" <>
To   <>
Subject   st: RE: -factor pcf- vs -pca- (was factor score postestimation)
Date   Sun, 11 Sep 2005 17:05:46 +0100

You are asking me to describe a minefield. 

Many people regard PCA as a transformation 
procedure, as no error term and thus no 
model is involved. Given the choice of 
either correlation or covariance matrix, 
results are eigenvectors, eigenvalues 
and other properties of that matrix, 
with (in a sense) no statistical arguments 
being used at all. 

Conversely, FA is most usually regarded
as a modelling technique. Its invocation 
of latent variables is regarded as its worst 
and its best feature, depending on tribal 

In many fields, one is regarded as wonderful
or at least useful, and the other is regarded as
misguided if not pernicious. 

But there is a large literature on this. Standard
texts include those by Jolliffe and Jackson. 
In my opinion, any text that does _not_ explain 
that the choice between PCA and FA is controversial 
is likely to be too elementary to be worth your time. 

Originally in Stata, meaning from version 2.1, 
PCA was just obtainable through 
-factor- as a special case. The bifurcation of -factor-
into -factor- and -pca- in version 8 was partly based
on a recognition that many people want principal 
components without any of the latent modelling excrescences. 

Whenever I use PCA it is often to help choose
predictors for a regression, but the PCA is just a means 
to an end, and not necessarily mentioned in the full report, 
but pretty much the same information
is given in a correlation or scatter plot matrix, which 
can be much more transparent. 


Garrard, Wendy M.
> Thanks very much. The "predict" is just what I needed.  Also, I
> appreciate your suggestion about using pca instead of factor 
> since I am
> using regression. I had noticed Stata has two commands that 
> do principal
> components; pca, and the pcf option within factor. I generally use the
> pcf  factor option, since I usually want to reduce several predictor
> variables to a single factor for purposes of regression. 
> I am a bit confused about the difference Stata is making with --pca--
> and --factor, pcf--, and should undoubtedly become familiar with this.
> Would you mind pointing out the gist, and perhaps a reference for more
> detail?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index