Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Factors correlated after -predict-... What is going wrong?

From	Trevor Zink <[email protected]>
To	<[email protected]>
Subject	Re: st: Factors correlated after -predict-... What is going wrong?
Date	Thu, 12 Dec 2013 12:44:46 -0800

This is a great explanation--it makes much more sense now!
Thank you!

Trevor


On 12/12/2013 10:13 AM, Red Owl wrote:

Trevor,

I'm going to take some major liberties in this response. With great
trepidation, I will offer an heuristic (not technically accurate)
explanation that will probably be painful to the statisticians here.
Hopefully, however, this heuristic explanation may help explain why no
rotation method is likely to produce perfectly uncorrelated factor
scores with real world data.

First, think of factor scores and factors as separate entities.

Think of the factor scores for any factor as equivalent to the estimates
or predictions from a regression model.  If those predictions were
perfect (i.e., error free with all residuals = 0), then the predicted
values would all fall exactly on the regression line implied by the
model.  That is not likely to happen, however, so the predictions likely
will not perfectly reproduce the regression line but, rather, will be
clustered around that regression line with some degree of error.

Think of the factors as equivalent to regression lines (which under
varimax or any other form of orthogonal rotation are kept 90 degrees
apart).  (The technical term for the factor lines is eigenvectors, and
varimax maintains orthogonal eigenvectors.)

Essentially, the same issue exists with factors and their factor scores
as we see with with regression models and their estimates.

You can view factor scores as estimates of the factors, but those
estimates almost surely include error (i.e., residuals not equal to 0).
  So, the factor scores will not usually fall exactly on the factor lines
(i.e., the eigenvectors).  Varimax will produce _factors_ (eigenvectors)
that are orthogonal, but the predicted _factor scores_ will only be
perfectly orthogonal in the rare case of perfect predictions of the
factors with all residuals = 0.

So, it is entirely possible to have orthogonal factors but factor scores
that are correlated to some degree.

It seems, then, that you were basically expecting a factor rotation
algorithm and a factor scores prediction algorithm to transform your
data into factor scores that can be measured without error and fall
exactly on the lines (eigenvectors) -- which would  be like spinning
gold out of straw.

If you want to minimize the correlations between estimated factor
scores, varimax is as good a choice for rotation as any, but you should
not expect perfect orthogonality in the estimated factor scores.  If you
are lucky, the correlations between factor scores will not be
statistically significant, and you can treat the factor scores as
possibly uncorrelated.

Hopefully, one of the statisticians on this list will "fix" my attempted
heuristic explanation if I have missed the mark with my regression analogy.

Red Owl
[email protected]

Red and William,

Thanks for the replies. I initially also excepted it was an estimation

sample issue, but I tried adjusting for that, and as Red's example
shows, it doesn't fix the issue. Thanks for the insight on varimax--I
was indeed under the impression that varimax would always produce
perfectly orthogonal factors. Interesting to know this is not the case.
Is there another method I should consider that produces less correlated
factor scores?


Thanks again,
Trevor

On 12/12/2013 4:26 AM, Red Owl wrote:

I doubt Trevor's concern Trevor is due exclusively to a failure to
maintain the e(sample) in estimating the factor score correlations.  I
believe the problem is that he was expecting that varimax rotation would
always produce perfectly uncorrelated factor scores and that their
correlation matrix should match the identity matrix presented after
-estat common-.

See the following example, which demonstrates that (a) -estat common-
simply produces an identity matrix after varimax rotation, as the mv.pdf
documentation indicates, (b) the estimated factor scores in this case
are not perfectly orthogonal even after varimax rotation, and (c) the
correlation matrix of factor scores calculated with -if e(sample)- does
not reproduce the identity matrix with either pairwise or
listwise/casewise deletion of cases with missing values.

** Begin Example
use http://www.stata-press.com/data/r13/sp2, clear
factor ghp31-ghp05, fac(3)
rotate, varimax
estat common
predict f1-f3
pwcorr f1-f3 if e(sample), sig
corr f1-f3 if e(sample)
** End Example

Red Owl
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


--
Trevor Zink, MBA, MA
Ph.D. Candidate
UC Regents Special Fellow
Bren School of Environmental Science and Management
University of California, Santa Barbara
[email protected] <mailto:[email protected]>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: Re: Re: st: Factors correlated after -predict-... What is going wrong?
  - From: Red Owl <[email protected]>

Prev by Date: Re: st: RE: -icd9- crashes Stata
Next by Date: st: Question about how streg using the constant-only model for starting values
Previous by thread: Re: Re: Re: st: Factors correlated after -predict-... What is going wrong?
Next by thread: st: dialog program: make LISTBOX initially hidden or use repopulate?
Index(es):
- Date
- Thread