Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: Imputing values for categorical data

From   "Dupont, William" <[email protected]>
To   <[email protected]>
Subject   st: RE: RE: Imputing values for categorical data
Date   Thu, 15 Apr 2004 15:07:23 -0500


Your imputation is correct.  Thanks for the clarification.


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, April 15, 2004 3:03 PM
To: [email protected]
Subject: st: RE: Imputing values for categorical data

impute the missing reference here. In this case, it 
happens to be a book I know about. 
(In other cases, in other postings, just giving 
author surnames and dates makes the reference search 
difficult: list members please note.) 

Statistical Analysis With Missing Data, Second Edition
Roderick J. A. Little, Donald B. Rubin
ISBN: 0-471-18386-5 Wiley 
September 2002 

[email protected] 

> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]]On Behalf Of Dupont, 
> William
> Sent: 15 April 2004 20:47
> To: [email protected]
> Subject: st: RE: Imputing values for categorical data
> Jennifer
> In my opinion, imputation makes the most sense when we wish to adjust 
> for confounding variables.  Suppose that I am primarily interested in 
> the relationship between y and x, and I have complete data on these 
> two variables from my data set.  I feel, however, that I should adjust

> my analysis for a number of other confounding covariates and I know 
> that missing values are scattered throughout these covariates.  If I 
> just regress y against x and these other covariates I get a complete 
> case
> analysis: any record that is missing any value of these covariates is 
> dropped from the analysis.  This can lead to a substantial loss of 
> power and has the potential to induce bias if having complete data
> is related
> to the response of interest.  Suppose that one of my confounding
> variables is gender.  If I have a number of records where y and x are
> known but gender is not, it does not seem sensible to throw out this
> information just because I would like to adjust my estimates 
> for gender.
> If, however, I impute gender I can avoid loosing these data.  
> As long as
> gender is only in the model as a confounder, I don't see that it does
> much harm to have an imputed value of say .2 for some patient, which
> means that based on her other covariates that she is 5 times 
> more likely
> to be of one gender than the other.
> A tricky problem with imputation is that we often lack assurance that 
> the missing values are missing at random.  However, even in this 
> situation, it is unclear that the complete case analysis is superior 
> to an imputed analysis for the situation described above.  Imputation
> becomes much more problematic when some variables of primary interest
> have missing values.
> The imputation gurus do not like the single conditional imputation 
> provided by Stata (see for example Little and Rubin 2002).  This is 
> because this technique underestimates the standard error of the 
> regression coefficient for covariates with imputed values and 
> overestimates the degrees of freedom.  Multiple imputation methods get

> around this problem and are fine as long as you are confident that the

> missing values are missing at random.  If your are only using 
> imputation for confounding variables I'm not convinced that it makes 
> much difference how you do the imputation.  However, multiple 
> imputation is always theoretically preferable and can avoid hassles in

> the event that
> you come up against a referee who objects to all use of single
> conditional imputation.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index