[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Renzo Comolli" <renzo.comolli@yale.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: My apology--Re: st: Removing the limit to 31 variables from stata -impute- ado |

Date |
Fri, 14 Nov 2003 11:29:01 -0500 |

Blessed be the mistake, it alerted me of a few important things. It turns out that stata now has a full package to analize datasets afeter valid imputation methods . net describe st0042, from(http://www.stata-journal.com/software/sj3-3) I would have never even thought about it without your post. Unfortunately, I could not find any package to actually make such valid imputation methods. I resorted to library to find the book quoted by the authors of the package above to learn more J. L. Schafer (1997) Analysis of Incomplete Multivariate Data Renzo ---------------------------------------------------------------------------- ---- /* From Weihua Guan <whgyu1@YAHOO.COM> */ To statalist@hsphsun2.harvard.edu Subject My apology--Re: st: Removing the limit to 31 variables from stata -impute- ado Date Fri, 14 Nov 2003 04:16:37 -0800 (PST) ---------------------------------------------------------------------------- ---- Sorry listers, This supposed to be a private email, and was sent to the list by mistake. Sorry! Weihua Guan --- Weihua Guan <whgyu1@yahoo.com> wrote: > Hi Jeff, > > How's going? I happen to read this post. It seems > the > method -impute- uses is out-of-date, and may give > invalid inferences from the imputed data. Does > Stata > has a plan to implement multiple imputation? > > Weihua > > --- "Jeff Pitblado, StataCorp LP" > <jpitblado@stata.com> wrote: > > > > Renzo Comolli <renzo.comolli@yale.edu> asks about > > the limit on the number of > > variables allowed by -impute-: > > > > > I know this behavior is strictly "at my own > risk". > > Anyway I (copied with a > > > different name and) removed the limitation to 31 > > variables in the impute.ado > > > It works with no waiting time at all even with > 52 > > variables. > > > I wonder whether StataCorp has been too risk > > averse when they now updated it > > > from version 3.1 to version 8 of the ado. > > > > > Anybody had similar experiences of removing the > > limitation? > > > From the explanation in the manual of what > > -impute- does, it is possible > > > that I could get away with so many variables > > because almost all of them > > > where dummies and therefore easy to order. > > (counting the categorical > > > variables before the dummy expansion I am way > > below 15) > > > > The -impute- command runs regressions by > best-subset > > regression, looking at > > the pattern of missing values in the predictors. > It > > is conceivable that > > -impute- must run a regression for each > combinations > > of the predictor > > variables, depending upon the patter of > missingness. > > > > In order to enumerate all best-subset > combinations, > > -impute- looks at the 0's > > and 1's in the binary representation of a long > > integer. In Stata, a long > > integer contains 32 bits--one of which is used for > > the sign. Thus each of the > > remaining bits are used to identify whether to > > include a predictor variable in > > a given regression, and increasing this limit > beyond > > 31 will not have a > > desirable result (even thought the modified > -impute- > > will not exit with an > > error). > > > > To illustrate how -impute- determines which > > variables to include in a > > regression, suppose there are 3 predictors and > that > > the pattern of missing > > values among them requires a regression for each > > combination. In this--albeit > > worst case scenario--there are 2^3 = 8 regressions > > to run. We can determine > > which predictor to include in a regression by > > looking at the binary > > representation of the regression index (starting > > from 0): > > > > integer (base 10) integer > > (binary) > > 0 000 > > 1 001 > > 2 010 > > 3 011 > > 4 100 > > 5 101 > > 6 110 > > 7 111 > > > > If the names of the predictor variables are x1 x2 > > and x3, we can interpret the > > binary number like this > > > > x3 x2 x1 > > > > ------------------------------------- > > <digit> <digit> > <digit> > > > > Thus 001 mean include x1, 011 means include x1 and > > x2, ... > > > > Given this implementation, there has to be a limit > > on how many predictors are > > allowed by the -impute- command before the > generated > > -long integer- variable > > becomes automatically -recast- to a -float- or > > -double-, thus breaking the > > implementation. > > > > By increasing the limit, all variables beyond the > > first 31 (possibly fewer) > > will not be used in any of the regressions. > > > > One way to get around this limit would be to add > an > > option to -impute-, say > > -nomissings()-, that will take a varlist. These > > variables will be assumed > > missing-value-free so that they could be present > in > > all regressions. > > > > We will look into adding this as a future update. > > > > --Jeff > > jpitblado@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: findvar command?** - Next by Date:
**st: Thank YOU RE: best way to create iweight - code?** - Previous by thread:
**st: RE: findvar command?** - Next by thread:
**st: Thank YOU RE: best way to create iweight - code?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |