Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Removing the limit to 31 variables from stata -impute- ado


From   Weihua Guan <whgyu1@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Removing the limit to 31 variables from stata -impute- ado
Date   Thu, 13 Nov 2003 20:12:03 -0800 (PST)

Hi Jeff,

How's going?  I happen to read this post. It seems the
method -impute- uses is out-of-date, and may give
invalid inferences from the imputed data.  Does Stata
has a plan to implement multiple imputation?

Weihua 

--- "Jeff Pitblado, StataCorp LP"
<jpitblado@stata.com> wrote:
> 
> Renzo Comolli <renzo.comolli@yale.edu> asks about
> the limit on the number of
> variables allowed by -impute-:
> 
> > I know this behavior is strictly "at my own risk".
> Anyway I (copied with a
> > different name and) removed the limitation to 31
> variables in the impute.ado
> > It works with no waiting time at all even with 52
> variables.
> > I wonder whether StataCorp has been too risk
> averse when they now updated it
> > from version 3.1 to version 8 of the ado.
> 
> > Anybody had similar experiences of removing the
> limitation?
> > From the explanation in the manual of what
> -impute- does, it is possible
> > that I could get away with so many variables
> because almost all of them
> > where dummies and therefore easy to order.
> (counting the categorical
> > variables before the dummy expansion I am way
> below 15)
> 
> The -impute- command runs regressions by best-subset
> regression, looking at
> the pattern of missing values in the predictors.  It
> is conceivable that
> -impute- must run a regression for each combinations
> of the predictor
> variables, depending upon the patter of missingness.
> 
> In order to enumerate all best-subset combinations,
> -impute- looks at the 0's
> and 1's in the binary representation of a long
> integer.  In Stata, a long
> integer contains 32 bits--one of which is used for
> the sign.  Thus each of the
> remaining bits are used to identify whether to
> include a predictor variable in
> a given regression, and increasing this limit beyond
> 31 will not have a
> desirable result (even thought the modified -impute-
> will not exit with an
> error).
> 
> To illustrate how -impute- determines which
> variables to include in a
> regression, suppose there are 3 predictors and that
> the pattern of missing
> values among them requires a regression for each
> combination.  In this--albeit
> worst case scenario--there are 2^3 = 8 regressions
> to run.  We can determine
> which predictor to include in a regression by
> looking at the binary
> representation of the regression index (starting
> from 0):
> 
>                 integer (base 10)       integer
> (binary)
>                     0                        000
>                     1                        001
>                     2                        010
>                     3                        011
>                     4                        100
>                     5                        101
>                     6                        110
>                     7                        111
> 
> If the names of the predictor variables are x1 x2
> and x3, we can interpret the
> binary number like this
> 
>                x3             x2             x1    
>                -------------------------------------
>                <digit>        <digit>        <digit>
> 
> Thus 001 mean include x1, 011 means include x1 and
> x2, ...
> 
> Given this implementation, there has to be a limit
> on how many predictors are
> allowed by the -impute- command before the generated
> -long integer- variable
> becomes automatically -recast- to a -float- or
> -double-, thus breaking the
> implementation.
> 
> By increasing the limit, all variables beyond the
> first 31 (possibly fewer)
> will not be used in any of the regressions.
> 
> One way to get around this limit would be to add an
> option to -impute-, say
> -nomissings()-, that will take a varlist.  These
> variables will be assumed
> missing-value-free so that they could be present in
> all regressions.
> 
> We will look into adding this as a future update.
> 
> --Jeff
> jpitblado@stata.com
> *
> *   For searches and help try:
> *  
> http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index