Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: trouble with -mi predict- in Stata 12


From   Omar Badawi <obadawi@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: trouble with -mi predict- in Stata 12
Date   Mon, 13 Aug 2012 10:38:52 -0400

Hi All,

I have been trying to generate predicted probabilities from a logistic
regression model using -mi estimate- in Stata 12 and was hoping to get
some insight into why my results do not seem correct. Here is the
background:

I have a dataset with patients who have an initial observation and one
for each follow-up period, up to 4 total observations. I performed a
multiple imputation using -mi impute chained (regress)- to generate 5
imputed datasets. I then converted to wide format using -mi convert
wide- and performed some data preparation for a logistic regression
model.

My logistic regression model has the following format:

. mi estimate, saving(miestimates, replace): logit y x1 x2 etc....

I then applied the following commands as described in the Stata 12 user guide:
. mi predict xb_mi using miestimates
. qui mi xeq: generate phat = invlogit(xb_mi)

When I try to examine the actual to predicted number of events across
various categories, my results do not appear to make sense. For
example, with the following command, where the variable 'category' has
5 different categories, my predictions are 2-3 fold higher than the
actual number of events in every category. This seems true across
different categorical variables.
. mi estimate : total phat y, over(category)

I also tried the following with the same results:
. mi xeq 1: total phat y, over(category)

I don't believe my model can possibly over-predict by 2-3 fold for
everyone because I also tested calibration on each individual imputed
dataset using -mi convert flongsep- and running a Hosmer-Lemeshow GOF
test on each of the 5 datasets. When I did that, I get excellent
calibration across the 10 deciles of risk. Also, when I generate a
model using complete case analysis instead of using multiple
imputation, the coefficients are similar and the calibration is
excellent so I'm fairly confident the issue is not dramatic
over-prediction of the model.

If anybody has any suggestions on where I might be going wrong or how
to troubleshoot, I would really appreciate it. Thanks in advance for
the help!

sincerely,

Omar Badawi
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index