Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Imputing values for categorical data


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: Imputing values for categorical data
Date   Thu, 15 Apr 2004 21:03:00 +0100

impute the missing reference here. In this case, it 
happens to be a book I know about. 
(In other cases, in other postings, just giving 
author surnames and dates makes the reference search 
difficult: list members please note.) 

Statistical Analysis With Missing Data, Second Edition
Roderick J. A. Little, Donald B. Rubin
ISBN: 0-471-18386-5 Wiley 
September 2002 

Nick 
[email protected] 

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Dupont,
> William
> Sent: 15 April 2004 20:47
> To: [email protected]
> Subject: st: RE: Imputing values for categorical data
> 
> 
> Jennifer
> 
> In my opinion, imputation makes the most sense when we wish to adjust
> for confounding variables.  Suppose that I am primarily interested in
> the relationship between y and x, and I have complete data on 
> these two
> variables from my data set.  I feel, however, that I should adjust my
> analysis for a number of other confounding covariates and I know that
> missing values are scattered throughout these covariates.  If I just
> regress y against x and these other covariates I get a complete case
> analysis: any record that is missing any value of these covariates is
> dropped from the analysis.  This can lead to a substantial 
> loss of power
> and has the potential to induce bias if having complete data 
> is related
> to the response of interest.  Suppose that one of my confounding
> variables is gender.  If I have a number of records where y and x are
> known but gender is not, it does not seem sensible to throw out this
> information just because I would like to adjust my estimates 
> for gender.
> If, however, I impute gender I can avoid loosing these data.  
> As long as
> gender is only in the model as a confounder, I don't see that it does
> much harm to have an imputed value of say .2 for some patient, which
> means that based on her other covariates that she is 5 times 
> more likely
> to be of one gender than the other.
> 
> A tricky problem with imputation is that we often lack assurance that
> the missing values are missing at random.  However, even in this
> situation, it is unclear that the complete case analysis is 
> superior to
> an imputed analysis for the situation described above.  Imputation
> becomes much more problematic when some variables of primary interest
> have missing values.
> 
> The imputation gurus do not like the single conditional imputation
> provided by Stata (see for example Little and Rubin 2002).  This is
> because this technique underestimates the standard error of the
> regression coefficient for covariates with imputed values and
> overestimates the degrees of freedom.  Multiple imputation methods get
> around this problem and are fine as long as you are confident that the
> missing values are missing at random.  If your are only using 
> imputation
> for confounding variables I'm not convinced that it makes much
> difference how you do the imputation.  However, multiple imputation is
> always theoretically preferable and can avoid hassles in the 
> event that
> you come up against a referee who objects to all use of single
> conditional imputation.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index