[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Weihua Guan <whgyu1@yahoo.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Removing the limit to 31 variables from stata -impute- ado |

Date |
Thu, 13 Nov 2003 20:12:03 -0800 (PST) |

Hi Jeff, How's going? I happen to read this post. It seems the method -impute- uses is out-of-date, and may give invalid inferences from the imputed data. Does Stata has a plan to implement multiple imputation? Weihua --- "Jeff Pitblado, StataCorp LP" <jpitblado@stata.com> wrote: > > Renzo Comolli <renzo.comolli@yale.edu> asks about > the limit on the number of > variables allowed by -impute-: > > > I know this behavior is strictly "at my own risk". > Anyway I (copied with a > > different name and) removed the limitation to 31 > variables in the impute.ado > > It works with no waiting time at all even with 52 > variables. > > I wonder whether StataCorp has been too risk > averse when they now updated it > > from version 3.1 to version 8 of the ado. > > > Anybody had similar experiences of removing the > limitation? > > From the explanation in the manual of what > -impute- does, it is possible > > that I could get away with so many variables > because almost all of them > > where dummies and therefore easy to order. > (counting the categorical > > variables before the dummy expansion I am way > below 15) > > The -impute- command runs regressions by best-subset > regression, looking at > the pattern of missing values in the predictors. It > is conceivable that > -impute- must run a regression for each combinations > of the predictor > variables, depending upon the patter of missingness. > > In order to enumerate all best-subset combinations, > -impute- looks at the 0's > and 1's in the binary representation of a long > integer. In Stata, a long > integer contains 32 bits--one of which is used for > the sign. Thus each of the > remaining bits are used to identify whether to > include a predictor variable in > a given regression, and increasing this limit beyond > 31 will not have a > desirable result (even thought the modified -impute- > will not exit with an > error). > > To illustrate how -impute- determines which > variables to include in a > regression, suppose there are 3 predictors and that > the pattern of missing > values among them requires a regression for each > combination. In this--albeit > worst case scenario--there are 2^3 = 8 regressions > to run. We can determine > which predictor to include in a regression by > looking at the binary > representation of the regression index (starting > from 0): > > integer (base 10) integer > (binary) > 0 000 > 1 001 > 2 010 > 3 011 > 4 100 > 5 101 > 6 110 > 7 111 > > If the names of the predictor variables are x1 x2 > and x3, we can interpret the > binary number like this > > x3 x2 x1 > ------------------------------------- > <digit> <digit> <digit> > > Thus 001 mean include x1, 011 means include x1 and > x2, ... > > Given this implementation, there has to be a limit > on how many predictors are > allowed by the -impute- command before the generated > -long integer- variable > becomes automatically -recast- to a -float- or > -double-, thus breaking the > implementation. > > By increasing the limit, all variables beyond the > first 31 (possibly fewer) > will not be used in any of the regressions. > > One way to get around this limit would be to add an > option to -impute-, say > -nomissings()-, that will take a varlist. These > variables will be assumed > missing-value-free so that they could be present in > all regressions. > > We will look into adding this as a future update. > > --Jeff > jpitblado@stata.com > * > * For searches and help try: > * > http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ __________________________________ Do you Yahoo!? Protect your identity with Yahoo! Mail AddressGuard http://antispam.yahoo.com/whatsnewfree * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**My apology--Re: st: Removing the limit to 31 variables from stata -impute- ado***From:*Weihua Guan <whgyu1@YAHOO.COM>

**References**:**Re: st: Removing the limit to 31 variables from stata -impute- ado***From:*jpitblado@stata.com (Jeff Pitblado, StataCorp LP)

- Prev by Date:
**st: IV with Sample Selection** - Next by Date:
**st: yellow dog linux on an Apple?** - Previous by thread:
**Re: st: Removing the limit to 31 variables from stata -impute- ado** - Next by thread:
**My apology--Re: st: Removing the limit to 31 variables from stata -impute- ado** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |