Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: My apology--Re: st: Removing the limit to 31 variables from stata -impute- ado


From   "Renzo Comolli" <[email protected]>
To   <[email protected]>
Subject   Re: My apology--Re: st: Removing the limit to 31 variables from stata -impute- ado
Date   Fri, 14 Nov 2003 11:29:01 -0500

Blessed be the mistake, it alerted me of a few important things.
It turns out that stata now has a full package to analize datasets afeter
valid imputation methods
. net describe st0042, from(http://www.stata-journal.com/software/sj3-3)
I would have never even thought about it without your post.

Unfortunately, I could not find any package to actually make such valid
imputation methods. I resorted to library to find the book quoted by the
authors of the package above to learn more J. L. Schafer (1997) Analysis of
Incomplete Multivariate Data

Renzo



----------------------------------------------------------------------------
----
/* From   Weihua Guan <[email protected]> */
To   [email protected] 
Subject   My apology--Re: st: Removing the limit to 31 variables from stata
-impute- ado 
Date   Fri, 14 Nov 2003 04:16:37 -0800 (PST) 

----------------------------------------------------------------------------
----

Sorry listers,

This supposed to be a private email, and was sent to
the list by mistake. Sorry!

Weihua Guan 
--- Weihua Guan <[email protected]> wrote:
> Hi Jeff,
> 
> How's going?  I happen to read this post. It seems
> the
> method -impute- uses is out-of-date, and may give
> invalid inferences from the imputed data.  Does
> Stata
> has a plan to implement multiple imputation?
> 
> Weihua 
> 
> --- "Jeff Pitblado, StataCorp LP"
> <[email protected]> wrote:
> > 
> > Renzo Comolli <[email protected]> asks about
> > the limit on the number of
> > variables allowed by -impute-:
> > 
> > > I know this behavior is strictly "at my own
> risk".
> > Anyway I (copied with a
> > > different name and) removed the limitation to 31
> > variables in the impute.ado
> > > It works with no waiting time at all even with
> 52
> > variables.
> > > I wonder whether StataCorp has been too risk
> > averse when they now updated it
> > > from version 3.1 to version 8 of the ado.
> > 
> > > Anybody had similar experiences of removing the
> > limitation?
> > > From the explanation in the manual of what
> > -impute- does, it is possible
> > > that I could get away with so many variables
> > because almost all of them
> > > where dummies and therefore easy to order.
> > (counting the categorical
> > > variables before the dummy expansion I am way
> > below 15)
> > 
> > The -impute- command runs regressions by
> best-subset
> > regression, looking at
> > the pattern of missing values in the predictors. 
> It
> > is conceivable that
> > -impute- must run a regression for each
> combinations
> > of the predictor
> > variables, depending upon the patter of
> missingness.
> > 
> > In order to enumerate all best-subset
> combinations,
> > -impute- looks at the 0's
> > and 1's in the binary representation of a long
> > integer.  In Stata, a long
> > integer contains 32 bits--one of which is used for
> > the sign.  Thus each of the
> > remaining bits are used to identify whether to
> > include a predictor variable in
> > a given regression, and increasing this limit
> beyond
> > 31 will not have a
> > desirable result (even thought the modified
> -impute-
> > will not exit with an
> > error).
> > 
> > To illustrate how -impute- determines which
> > variables to include in a
> > regression, suppose there are 3 predictors and
> that
> > the pattern of missing
> > values among them requires a regression for each
> > combination.  In this--albeit
> > worst case scenario--there are 2^3 = 8 regressions
> > to run.  We can determine
> > which predictor to include in a regression by
> > looking at the binary
> > representation of the regression index (starting
> > from 0):
> > 
> >                 integer (base 10)       integer
> > (binary)
> >                     0                        000
> >                     1                        001
> >                     2                        010
> >                     3                        011
> >                     4                        100
> >                     5                        101
> >                     6                        110
> >                     7                        111
> > 
> > If the names of the predictor variables are x1 x2
> > and x3, we can interpret the
> > binary number like this
> > 
> >                x3             x2             x1   
> 
> >               
> -------------------------------------
> >                <digit>        <digit>       
> <digit>
> > 
> > Thus 001 mean include x1, 011 means include x1 and
> > x2, ...
> > 
> > Given this implementation, there has to be a limit
> > on how many predictors are
> > allowed by the -impute- command before the
> generated
> > -long integer- variable
> > becomes automatically -recast- to a -float- or
> > -double-, thus breaking the
> > implementation.
> > 
> > By increasing the limit, all variables beyond the
> > first 31 (possibly fewer)
> > will not be used in any of the regressions.
> > 
> > One way to get around this limit would be to add
> an
> > option to -impute-, say
> > -nomissings()-, that will take a varlist.  These
> > variables will be assumed
> > missing-value-free so that they could be present
> in
> > all regressions.
> > 
> > We will look into adding this as a future update.
> > 
> > --Jeff
> > [email protected]


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index