# st: RE: Factor analysis(?) question - missing data

 From "Nick Cox" To Subject st: RE: Factor analysis(?) question - missing data Date Tue, 22 Apr 2008 19:42:39 +0100

```To state the obvious, missing data are always problematic and your case
seems worst than most in that the optimal way to impute depends on the
structure of relationships that a factor analysis is (presumably)
intended to discover -- or to test. (Apologies; factor analysis isn't my
kind of religion and I may not get the wording right.) The risk of
circular argument seems very great.

Others will no doubt suggest currently standard solutions but in this
case
perhaps there is scope for a tailored iterative approach.

Factor analysis on complete observations may suggest weights for
imputing the variable
with least missing values.

Factor analysis on (the ideally then greater) set of observations may
then suggest weights for imputing the next least problematic variable.

And so on. In general, keeping track of weights as they will change will
highlight stable and unstable characteristics.

That doesn't rule just averaging what you have as a stark comparison.

More generally, looking for an optimal solution to this kind of problem
seems less appropriate than trying two or three different solutions and
seeing what agreement you get.

Nick
n.j.cox@durham.ac.uk

Glenn Hoetker

This is perhaps more of a statistical questions than a Stata
question.  My situation is this.  I have a large dataset in which
there are 5-6 indicators each for a bunch of latent variables. Let me
take as an example having 5 measures for innovative output, x1-x5.
The problem is that very few observations have all 5 measures; some
are missing x1, some x2, etc. Almost every observation has at least 3
measures and most 4.

Is there anyway to optimally combine these indicators to measure the
underlying construct of innovative output that would use all available
measures for a given observation, i.e., x1-x4 for one observation, [x1-
x3,x5] for another, etc.  If I thought these were equally weighted, I
could just average over the available variables in each, setting aside
issues of measurement error.  However, I'm not convinced they are
equally weighted and would like to do this in a more rigorous fashion.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```