Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: surprising (at least to me) behavior when using -predict- after -mim-


From   Jordan H <jihool3670@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: surprising (at least to me) behavior when using -predict- after -mim-
Date   Thu, 24 Feb 2011 15:43:55 -0500

Dear all,
Suppose I have a data set with missing data and as such, I have used
multiple imputation to create 2 imputed data sets.  As per the
documentation for -mim-, my data set is set up as follows:

_mj  _mi     y       x
    0      1   1.1  100.1
    0      2   9.2      .
    0      3   3.7      .
    1      1   1.1  100.1
    1      2   9.2  105.3
    1      3   3.7  110.9
    2      1   1.1  100.1
    2      2   9.2  104.8
    2      3   3.7  111.3

I have run -- mim: logit y x -- to fit a model and combine the
estimates across imputations.  When I  subsequently run -- mim:
predict y_predicted --, STATA returns predicted probabilities for
those observations that have missing data i.e.

_mj  _mi     y       x      y_predicted
    0      1     0  100.1   0.39
    0      2     0      .       0.25
    0      3     1      .       0.56
    1      1     0  100.1   0.39
    1      2     0  105.3   0.71
    1      3     1  110.9   0.87
    2      1     0  100.1   0.39
    2      2     0  104.8   0.73
    2      3     1  111.3   0.86

How is it producing predicted probabilities when there is missing
data?  Running -predict- after fitting a logit model  produces . for
observations with missing data due to case-wise deletion.  I've gone
through the documentation...what am I missing?

Related question:  I have also have a test dataset which is formatted
the same as above ie. with both the original, non-imputed data in the
same file as the imputed data.  Once I have fit a model on the
training dataset, I would like to analyze its predictive capabilities
by predicting from the observations in the test dataset and looking at
things like sensitivity/specificity/etc.  My question here is, what
predicted probabilities should I be concerned with?  Should I be
concerned with how well the model predicts the un-imputed data?  Or
should I just worry about how well it predicts the data that has been
imputed?

Thanks so much for the consideration!
Jordan

---

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index