[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Factor analysis(?) question - missing data

From   "Nick Cox" <>
To   <>
Subject   st: RE: Factor analysis(?) question - missing data
Date   Tue, 22 Apr 2008 19:42:39 +0100

To state the obvious, missing data are always problematic and your case
seems worst than most in that the optimal way to impute depends on the
structure of relationships that a factor analysis is (presumably)
intended to discover -- or to test. (Apologies; factor analysis isn't my
kind of religion and I may not get the wording right.) The risk of
circular argument seems very great.

Others will no doubt suggest currently standard solutions but in this
perhaps there is scope for a tailored iterative approach. 

Factor analysis on complete observations may suggest weights for
imputing the variable 
with least missing values. 

Factor analysis on (the ideally then greater) set of observations may
then suggest weights for imputing the next least problematic variable. 

And so on. In general, keeping track of weights as they will change will
highlight stable and unstable characteristics. 

That doesn't rule just averaging what you have as a stark comparison. 

More generally, looking for an optimal solution to this kind of problem
seems less appropriate than trying two or three different solutions and
seeing what agreement you get. 


Glenn Hoetker

This is perhaps more of a statistical questions than a Stata  
question.  My situation is this.  I have a large dataset in which  
there are 5-6 indicators each for a bunch of latent variables. Let me  
take as an example having 5 measures for innovative output, x1-x5.   
The problem is that very few observations have all 5 measures; some  
are missing x1, some x2, etc. Almost every observation has at least 3  
measures and most 4.

Is there anyway to optimally combine these indicators to measure the  
underlying construct of innovative output that would use all available  
measures for a given observation, i.e., x1-x4 for one observation, [x1- 
x3,x5] for another, etc.  If I thought these were equally weighted, I  
could just average over the available variables in each, setting aside  
issues of measurement error.  However, I'm not convinced they are  
equally weighted and would like to do this in a more rigorous fashion.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index