Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: simple way to create missing data that is "missing at random" from a small datset


From   Maarten buis <[email protected]>
To   Suzy <[email protected]>, [email protected]
Subject   st: Re: simple way to create missing data that is "missing at random" from a small datset
Date   Fri, 24 Feb 2006 22:49:56 +0000 (GMT)

Suzy:

No problem, but if you find my reply puzzeling than chances are that someone else on statalist
might find it puzzeling too, so I also sent my reply (and your full question underneath) to the
statalist. 

The variable p is the probability of missingness, so the mean of p should be .1 if you want
apporximately 10% missingness. Your mean is .99, so most people will be made missing.  -invlogit-
transforms a linear function of "explanatory variables" (in yourcase .1*age) to lie between zero
and one according to 1/(1+exp{-xb}), so the values you plug in (in your case .1 for age and 0 for
the constant) are "logistic regression coefficients". I would play around with values of the
constant so that you get a mean p of about .1 (the more negative the constant the lower the
probability), For instance look at the mean of p if you do -gen p =invlogit(-10 + .1*age)-

Afterwards I would look if there is enough variation in the values of p. If the value of p is
approximately constant than the influence of age on the probability of missingness is probably not
strong enough to show up in your simulations. If p is approximately constant you should increase
the parameter of age. This might than mess up the mean probability of missingness a bit, so than
it would be good to check if the mean probability of missingness is still close to .1

HTH,
Maarten

--- Suzy <[email protected]> wrote:

> Dear Maarten:
> 
> Hope you don't mind the direct e-mail. I tried your code based on my 
> dataset and what I thought I should do and all of my BMI observations 
> went missing rather than say 5-10%. I have obviously done something 
> wrong with it. I'm hoping you can help. I would like about 10% of the 
> BMI variable to be missing. I want the missingness to be associated with 
> older age, but not dependent on the value of BMI - thus hopefully 
> satisfying the MAR assumption.
> 
> I've included the summary stats of the variables, the code you provided 
> (I modified it somewhat) and the result...
> can you see what I did wrong??
> 
> summarize
> 
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>          sex |       332    .4849398    .5005275          0          1
>         race |       332    .3253012    .4691944          0          1
>          age |       332    52.06024     12.6857         28         82
>         fhdm |       332    .3373494    .4735189          0          1
>          bmi |       332    30.98795     6.18837         18         48
> -------------+--------------------------------------------------------
>        dmcat |       332    .2771084    .4482461          0          1
> 
> . gen p = invlogit(.1*age)
>  
> . sum p
> 
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>            p |       332    .9894261    .0121324   .9426758   .9997254
> 
> 
> . replace bmi = . if uniform() < p
> (332 real changes made, 332 to missing)
> 
> . summarize
> 
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>          sex |       332    .4849398    .5005275          0          1
>         race |       332    .3253012    .4691944          0          1
>          age |       332    52.06024     12.6857         28         82
>         fhdm |       332    .3373494    .4735189          0          1
>          bmi |         0
> -------------+--------------------------------------------------------
>        dmcat |       332    .2771084    .4482461          0          1
>            p |       332    .9894261    .0121324   .9426758   .9997254
> 
> 
> 
> 


-----------------------------------------
between 1/2/2006 and 31/3/2006 I will be
visiting the UCLA, during this time the
best way to reach me is by email

Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

visiting adress:
Buitenveldertselaan 3 (Metropolitan), room Z214

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


		
___________________________________________________________ 
Win a BlackBerry device from O2 with Yahoo!. Enter now. http://www.yahoo.co.uk/blackberry
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index