Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: weird parameter estimate using -logit- with -mim-


From   "Michael I. Lichter" <[email protected]>
To   [email protected]
Subject   Re: st: weird parameter estimate using -logit- with -mim-
Date   Mon, 02 Mar 2009 15:34:56 -0500

[Sorry for reposting -- the first line of the original post got eaten by the listmonster]

Summary: When I run -logit- under -mim- on a set of imputed datasets created using -ice-, I get a parameter estimate of 0 and a standard error of 0 for a continuous independent variable. However, fitting models for each imputed dataset individually (without -mim-) produces non-zero estimates with similar (to each other) magnitudes. This suggests either that -mim- is telling me something I don't know how to interpret, there is something wrong with -mim-, or there is something nearly-invisible wrong with the data.

Longer version: I am looking at the relationship between adoption of a particular kind of software system by physicians and a set of independent variables. One of my independent variables is years of experience, EXPER. EXPER is continuous and approximately normally distributed. Unfortunately, 25% of my cases are missing on EXPER, so I decided to try multiple imputation using the Galati, Carlin, and Royston -ice- and -mim- commands available from SSC. After -ice-, the distribution of EXPER in the imputed datasets looks fine (that is, similar in shape, mean, and variance to the original), and its relationship to the dependent variable HASEMR looks the same. If I use -logit- (without -mim-) to look at the relationship between the two variables in the original dataset (_mj==0) and in the first imputed dataset (_mj==1), I get nearly identical results. (I've added PCTPOVERTY, a continuous variable with no missing data, to show below that my problem is just with EXPER.)

. logit hasemr exper pctPoverty if _mj==0

Iteration 0:   log likelihood =  -421.1634
Iteration 1:   log likelihood = -398.42368
Iteration 2:   log likelihood = -397.85135
Iteration 3:   log likelihood = -397.84996

Logistic regression Number of obs = 699 LR chi2(2) = 46.63 Prob > chi2 = 0.0000 Log likelihood = -397.84996 Pseudo R2 = 0.0554

------------------------------------------------------------------------------ hasemr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- exper | -.0505082 .0091315 -5.53 0.000 -.0684057 -.0326108 pctPoverty | -.0441709 .0135119 -3.27 0.001 -.0706537 -.0176881 _cons | 1.021042 .3025942 3.37 0.001 .4279685 1.614116 ------------------------------------------------------------------------------

. logit hasemr exper pctPoverty if _mj==1

Iteration 0:   log likelihood = -617.85901
Iteration 1:   log likelihood =  -588.0887
Iteration 2:   log likelihood = -587.54423
Iteration 3:   log likelihood =  -587.5435

Logistic regression Number of obs = 1001 LR chi2(2) = 60.63 Prob > chi2 = 0.0000 Log likelihood = -587.5435 Pseudo R2 = 0.0491

------------------------------------------------------------------------------ hasemr | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- exper | -.0492512 .0074785 -6.59 0.000 -.0639089 -.0345936 pctPoverty | -.0324184 .0106465 -3.04 0.002 -.0532852 -.0115517 _cons | .9076523 .2400196 3.78 0.000 .4372224 1.378082 ------------------------------------------------------------------------------

However, this is what happens when I try to estimate the same model with -mim-:

. mim: logit hasemr exper pctPoverty

Multiple-imputation estimates (logit) Imputations = 10 Logistic regression Minimum obs = 1001 Minimum dof = 103.2

------------------------------------------------------------------------------ hasemr | Coef. Std. Err. t P>|t| [95% Conf. Int.] FMI -------------+---------------------------------------------------------------- exper | -0 0 -5.52 0.000 -0 -0 0.285 pctPoverty | -.036964 .011038 -3.35 0.001 -.058637 -.015291 0.059 _cons | .937575 .270402 3.47 0.001 .403621 1.47153 0.217 ------------------------------------------------------------------------------

Obviously something is wrong. It can't be just that the uncertainty of the estimate is high due to the high proportion of missing data, since that should result in a large standard error. Note that the imputation procedure does produce a small number of negative and therefore nonsensical values for years of experience (14 total across 10 imputed datasets), but this problem doesn't go away when I set those to 0. Also note that the dependent variable HASEMR has about 8% missing in the original dataset.

Any idea what's wrong, or suggestions for diagnostics? Thanks.

--
Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail: [email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index