Thanks Maarten for providing me more detail on your command. I worked
with the constant and now have the correct proportion of missingness,
although I'm not sure what the implications are of the std dev and the
max values of p (.549). Now that I better understand what the command is
doing, I will continue to work with the values and look at the outcomes.
I really appreciate your help!
. gen p = invlogit( -8 +.1*age )
. sum p
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p | 332 .0999268 .113432 .0054863 .549834
. replace bmi = . if uniform() < p
(27 real changes made, 27 to missing)
Maarten buis wrote:
Suzy:
No problem, but if you find my reply puzzeling than chances are that someone else on statalist
might find it puzzeling too, so I also sent my reply (and your full question underneath) to the
statalist.
The variable p is the probability of missingness, so the mean of p should be .1 if you want
apporximately 10% missingness. Your mean is .99, so most people will be made missing. -invlogit-
transforms a linear function of "explanatory variables" (in yourcase .1*age) to lie between zero
and one according to 1/(1+exp{-xb}), so the values you plug in (in your case .1 for age and 0 for
the constant) are "logistic regression coefficients". I would play around with values of the
constant so that you get a mean p of about .1 (the more negative the constant the lower the
probability), For instance look at the mean of p if you do -gen p =invlogit(-10 + .1*age)-
Afterwards I would look if there is enough variation in the values of p. If the value of p is
approximately constant than the influence of age on the probability of missingness is probably not
strong enough to show up in your simulations. If p is approximately constant you should increase
the parameter of age. This might than mess up the mean probability of missingness a bit, so than
it would be good to check if the mean probability of missingness is still close to .1
HTH,
Maarten
--- Suzy <[email protected]> wrote:
Dear Maarten:
Hope you don't mind the direct e-mail. I tried your code based on my
dataset and what I thought I should do and all of my BMI observations
went missing rather than say 5-10%. I have obviously done something
wrong with it. I'm hoping you can help. I would like about 10% of the
BMI variable to be missing. I want the missingness to be associated with
older age, but not dependent on the value of BMI - thus hopefully
satisfying the MAR assumption.
I've included the summary stats of the variables, the code you provided
(I modified it somewhat) and the result...
can you see what I did wrong??
summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
sex | 332 .4849398 .5005275 0 1
race | 332 .3253012 .4691944 0 1
age | 332 52.06024 12.6857 28 82
fhdm | 332 .3373494 .4735189 0 1
bmi | 332 30.98795 6.18837 18 48
-------------+--------------------------------------------------------
dmcat | 332 .2771084 .4482461 0 1
. gen p = invlogit(.1*age)
. sum p
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
p | 332 .9894261 .0121324 .9426758 .9997254
. replace bmi = . if uniform() < p
(332 real changes made, 332 to missing)
. summarize
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
sex | 332 .4849398 .5005275 0 1
race | 332 .3253012 .4691944 0 1
age | 332 52.06024 12.6857 28 82
fhdm | 332 .3373494 .4735189 0 1
bmi | 0
-------------+--------------------------------------------------------
dmcat | 332 .2771084 .4482461 0 1
p | 332 .9894261 .0121324 .9426758 .9997254
