[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: mean, mode or median for missing values

From   Maarten buis <>
Subject   Re: st: mean, mode or median for missing values
Date   Wed, 3 Feb 2010 09:09:46 +0000 (GMT)

--- On Wed, 3/2/10, Richard Williams wrote:
> If you are going to use it as an
> independent variable, do you plan on treating it as
> continuous?  If so, you wouldn't be the first person in
> the world to do so and plugging in a mean or an imputed
> regression estimate becomes more reasonable.

I'll have to disagree with that. Imputing the dependent 
variables with mean values will seriously distort the 
associations you are going to find, as can be seen in
the example below:

*---------- begin example ------------
// create some data
drop _all
set obs 200
gen x = rnormal()
gen y = x + .5*rnormal()

// create missing values 
gen y2 = y in 20/l

// impute with mean values
sum y2
replace y2 = r(mean) if y2==.

// display the concequences
scatter y2 x in 20/l ||        ///
scatter y2 x in 1/19,          ///      
    legend(order(1 "observed"
                 2 "imputed"))
*------------ end example ------------
( For more on how to use examples I sent to statalist see: )

For a solution Jet should consider the answers to Jet's question of 

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2020 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index