[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
jpitblado@stata.com (Jeff Pitblado, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Removing the limit to 31 variables from stata -impute- ado |

Date |
Thu, 13 Nov 2003 13:26:57 -0600 |

Renzo Comolli <renzo.comolli@yale.edu> asks about the limit on the number of variables allowed by -impute-: > I know this behavior is strictly "at my own risk". Anyway I (copied with a > different name and) removed the limitation to 31 variables in the impute.ado > It works with no waiting time at all even with 52 variables. > I wonder whether StataCorp has been too risk averse when they now updated it > from version 3.1 to version 8 of the ado. > Anybody had similar experiences of removing the limitation? > From the explanation in the manual of what -impute- does, it is possible > that I could get away with so many variables because almost all of them > where dummies and therefore easy to order. (counting the categorical > variables before the dummy expansion I am way below 15) The -impute- command runs regressions by best-subset regression, looking at the pattern of missing values in the predictors. It is conceivable that -impute- must run a regression for each combinations of the predictor variables, depending upon the patter of missingness. In order to enumerate all best-subset combinations, -impute- looks at the 0's and 1's in the binary representation of a long integer. In Stata, a long integer contains 32 bits--one of which is used for the sign. Thus each of the remaining bits are used to identify whether to include a predictor variable in a given regression, and increasing this limit beyond 31 will not have a desirable result (even thought the modified -impute- will not exit with an error). To illustrate how -impute- determines which variables to include in a regression, suppose there are 3 predictors and that the pattern of missing values among them requires a regression for each combination. In this--albeit worst case scenario--there are 2^3 = 8 regressions to run. We can determine which predictor to include in a regression by looking at the binary representation of the regression index (starting from 0): integer (base 10) integer (binary) 0 000 1 001 2 010 3 011 4 100 5 101 6 110 7 111 If the names of the predictor variables are x1 x2 and x3, we can interpret the binary number like this x3 x2 x1 ------------------------------------- <digit> <digit> <digit> Thus 001 mean include x1, 011 means include x1 and x2, ... Given this implementation, there has to be a limit on how many predictors are allowed by the -impute- command before the generated -long integer- variable becomes automatically -recast- to a -float- or -double-, thus breaking the implementation. By increasing the limit, all variables beyond the first 31 (possibly fewer) will not be used in any of the regressions. One way to get around this limit would be to add an option to -impute-, say -nomissings()-, that will take a varlist. These variables will be assumed missing-value-free so that they could be present in all regressions. We will look into adding this as a future update. --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Removing the limit to 31 variables from stata -impute- ado***From:*Weihua Guan <whgyu1@yahoo.com>

- Prev by Date:
**st: -twoway- inconsistency** - Next by Date:
**st: Can a non-programmer permanently save the results of estimation?** - Previous by thread:
**st: Stata syntax with GVIM** - Next by thread:
**Re: st: Removing the limit to 31 variables from stata -impute- ado** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |