Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Factor analysis(?) question - missing data


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Factor analysis(?) question - missing data
Date   Tue, 22 Apr 2008 13:36:00 -0500

On Apr 22, 2008, at 1:06 PM, Glenn Hoetker wrote:
This is perhaps more of a statistical questions than a Stata question. My situation is this. I have a large dataset in which there are 5-6 indicators each for a bunch of latent variables. Let me take as an example having 5 measures for innovative output, x1- x5. The problem is that very few observations have all 5 measures; some are missing x1, some x2, etc. Almost every observation has at least 3 measures and most 4.

Is there anyway to optimally combine these indicators to measure the underlying construct of innovative output that would use all available measures for a given observation, i.e., x1-x4 for one observation, [x1-x3,x5] for another, etc. If I thought these were equally weighted, I could just average over the available variables in each, setting aside issues of measurement error. However, I'm not convinced they are equally weighted and would like to do this in a more rigorous fashion.

How you approach this will depend critically on whether the missing data are missing at random (MAR), or, more precisely, on whether you are willing to assume that this is so. It is often difficult, if not impossible, to investigate this rigorously.

If you are willing to assume MAR, then you have at least 3 options. You can fit a factor analytic (or other similar) model directly using an algorithm that can accommodate missing data (e.g., the EM algorithm, or, better yet, the ECME algorithm; see, for example, Liu and Rubin, Statistica Sinica 8 (1998), 729-747). I once programmed this (EM) in Stata to handle multiple regression with missing data -- perhaps others have done more. Second, you can fit the model using - gllamm-, which will accommodate missing data under the MAR assumption. And finally, you could use multiple imputation, as implemented for example in Royston's excellent -ice- package (try - ssc describe ice-). In all cases, you could then use empirical Bayes estimates of the latent factors in subsequent analyses, or go on to fit a full structural model.

I'm sure others will have more to say...


-- Phil

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index