[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Stanislav Kolenikov" <skolenik@yahoo.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: missingness in a large, complex sampling deign |

Date |
Mon, 16 Aug 2004 17:32:38 -0000 |

--- In statalist@yahoogroups.com, "Colleen Daly Martinez" <colleendalymartinez@c...> wrote: > In my analysis of data from a large (over 5,000) nationally > representative > sample study, which used complex sampling, I'm finding that a number of > variables I'm examining have many missings- as many as 1,500 or more (I > believe that they are non-respondents). > > I'm wondering if anyone has suggestions, or can point me to > references which > address the issue of managing this issue and the implications for my > analysis. Your concern is directly related to the analysis you would want to perform on it. Are you planning means and tabulations? Factor analysis? Regression? Depending on the type of analysis, you may or may not need to involve some of the heavy mahinery described in Little & Rubin's or Schaffer's books (although one thing that you certainly need to know about your data is the distinction of MAR, MCAR and NMAR; pick those up as quickly as you can if you have not seen this before!). There are two main approaches to the missing data currently on the market: imputation of some kind, and integrating the missing data out. In the first approach, you try to come up with some reasonable number for the missing cell: in regression imputation, that is a linear prediction given other variables; in hot-deck imputation, it is a random pick from the same stratum; in multiple imputation, it is a random pick from a joint distribution of the variables of interest repeated several times and combined together appropriately. In the latter approach, you try to write down the likelihood for the complete data, see what it looks like for the available data (you would need to take the expectation conditional on those observed data, which is where the integration comes in), and see if this can be maximized reasonably easily. It is not clear which of the approaches is making stronger assumptions (and thus less robust). Note that Stata does not have a multiple imputation procedure, and lots of users have complained about this, but not without a reason, as it is often the case with Stata: if something obvious is not implemented, may be it is not so obvious to begin with? Just as there is no single bootstrap procedure that will work in 100% cases, and you need to think about things like smoothness of your distribution fucntional, dependencies in your data and pivoting your statistic, you have to make a lot of substantial choices in the multiple imputation before it starts giving sensible results. Theoretically, you can incorporate all of the missing data in a single maximum likelihood procedure, and feed that into Stata's great -ml- maximizer, but I found this quite impractical in my simulations studies when PCs were under 1MHz four years ago for my logistic regression with missing covariates, 50 observations and 3 explanatory variables that took about half an hour to converge. I would expect some gains in speed due to overall increase in the computational power, plus marginal increase due to continuous improvement of Stata's code, but that will hardly bring a factor of ten together. (I think these days I may really benefit from Stata's plug-ins, but I've never tried them so far.) Stas * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: missingness in a large, complex sampling deign***From:*"Colleen Daly Martinez" <colleendalymartinez@comcast.net>

- Prev by Date:
**st: Re: Stata Output to Tex** - Next by Date:
**st: Windows XP SP2 Update & Stata** - Previous by thread:
**st: RE: missingness in a large, complex sampling deign** - Next by thread:
**st: xt and simultaneous equations** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |