Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Judy You <joodyu@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st:glm with bin family and link probit VS. probit |
Date | Wed, 1 Jun 2011 09:23:13 +0930 |
Dear Richard: Thanks for your replying my question and your trusting probit over glm in this case. The reason that i run the two models is to check if they got the same answer, as most of reference books said so. Now, we could see the difference. The difference is even bigger when the marginal effect is estimated after the modellings. Could any other experts explain if the glm program could be improved? Cheers 2011/5/31 Richard Williams <richardwilliams.ndu@gmail.com>: > My guess is that -glm- lacks the specialized code that detects the "predicts > failure perfectly" variables. So, instead of vars and observations getting > dropped, f15 and m15 stay in but their estimated standard errors are really > large. I would trust the results from -probit- more than the results from > -glm- (why do you need both anyway?). You could try -exlogistic- but your > sample may be too big for that to be feasible. > > At 11:13 PM 5/30/2011, Judy You wrote: >> >> Dear Stata Experts: >> >> >> >> I have a question regards to comparisons of the two models: glm with >> bin family and link probit VS. probit. >> >> >> >> The data is number of people who died from infection disease by age >> group as follows. >> >> >> >> agegp 0 1 >> >> >> >> 0 154573 2 >> >> 15 97581 0 >> >> 25 159888 9 >> >> 40 191122 35 >> >> 65 29329 20 >> >> >> >> I got the different results by using glm with bin family and link >> probit and probit. The main difference is that glm dropped the last >> dummy variable "f65", while keep all the coefficient even with the zeo >> values (eg., f15 and m15). The Probit dropped not only f65, but also >> the zeo values eg., f15 and m15. The different results lead to >> different estimation of marginal effects followed by the two models. >> Any idea and advice how to control the two models using the same >> independent dummy variables? >> >> >> >> Your help will be much appreciated! >> >> >> >> . glm AB m0- f65 [fw= ABfreq], f(b) l(probit) iterate(10) >> >> note: f65 omitted because of collinearity >> >> >> >> Iteration 0: log likelihood = -47450.19 >> >> Iteration 1: log likelihood = -825.64878 >> >> Iteration 2: log likelihood = -629.32298 >> >> Iteration 3: log likelihood = -622.2711 >> >> Iteration 4: log likelihood = -621.33495 >> >> Iteration 5: log likelihood = -621.31077 >> >> Iteration 6: log likelihood = -621.30724 >> >> Iteration 7: log likelihood = -621.30711 >> >> Iteration 8: log likelihood = -621.30711 >> >> Iteration 9: log likelihood = -621.30711 >> >> Iteration 10: log likelihood = -621.30711 >> >> convergence not achieved >> >> >> >> Generalized linear models No. of obs = >> 632559 >> >> Optimization : ML Residual df = >> 632549 >> >> Scale parameter = 1 >> >> Deviance = 1242.614219 (1/df) Deviance = >> .0019645 >> >> Pearson = 534978.001 (1/df) Pearson = >> .8457495 >> >> >> >> Variance function: V(u) = u*(1-u) [Bernoulli] >> >> Link function : g(u) = invnorm(u) [Probit] >> >> >> >> AIC = .001996 >> >> Log likelihood = -621.3071096 BIC = >> -8448049 >> >> >> >> >> >> OIM >> >> AB Coef. Std. Err. z P>z [95% Conf. Interval] >> >> >> >> m0 -.9250873 .2495697 -3.71 0.000 -1.414235 -.4359398 >> >> m15 -2.901722 7.830577 -0.37 0.711 -18.24937 12.44593 >> >> m25 -.5044691 .146919 -3.43 0.001 -.792425 -.2165132 >> >> m40 -.2180508 .1199777 -1.82 0.069 -.4532027 .0171012 >> >> m65 .1468668 .133816 1.10 0.272 -.1154077 .4091414 >> >> f0 -.9122453 .2501379 -3.65 0.000 -1.402507 >> -.4219841 >> >> f15 -2.901722 8.073848 -0.36 0.719 -18.72617 12.92273 >> >> f25 -.6696904 .1741904 -3.84 0.000 -1.011097 -.3282836 >> >> f40 -.3572294 .1297122 -2.75 0.006 -.6114606 -.1029981 >> >> f65 (omitted) >> >> _cons -3.288246 .1063749 -30.91 0.000 -3.496737 -3.079755 >> >> >> >> . probit AB m0- f65 [fw= ABfreq], iterate(10) >> >> >> >> note: m15 != 0 predicts failure perfectly >> >> m15 dropped and 190 obs not used >> >> >> >> note: f15 != 0 predicts failure perfectly >> >> f15 dropped and 190 obs not used >> >> >> >> note: f65 omitted because of collinearity >> >> Iteration 0: log likelihood = -660.01746 >> >> Iteration 1: log likelihood = -629.8237 >> >> Iteration 2: log likelihood = -621.8218 >> >> Iteration 3: log likelihood = -621.31032 >> >> Iteration 4: log likelihood = -621.30614 >> >> Iteration 5: log likelihood = -621.30613 >> >> >> >> Probit regression Number of obs = >> 534978 >> >> LR chi2(7) = 77.42 >> >> Prob > chi2 = 0.0000 >> >> Log likelihood = -621.30613 Pseudo R2 = 0.0587 >> >> >> >> >> >> AB Coef. Std. Err. z P>z [95% Conf. Interval] >> >> >> >> m0 -.9250871 .2495696 -3.71 0.000 -1.414235 -.4359397 >> >> m15 (omitted) >> >> m25 -.5044691 .146919 -3.43 0.001 -.792425 -.2165132 >> >> m40 -.2180508 .1199777 -1.82 0.069 -.4532027 .0171012 >> >> m65 .1468668 .133816 1.10 0.272 -.1154077 .4091414 >> >> f0 -.9122452 .2501378 -3.65 0.000 -1.402506 >> -.4219841 >> >> f15 (omitted) >> >> f25 -.6696904 .1741904 -3.84 0.000 -1.011097 -.3282836 >> >> f40 -.3572294 .1297122 -2.75 0.006 -.6114606 -.1029981 >> >> f65 (omitted) >> >> _cons -3.288246 .1063749 -30.91 0.000 -3.496737 -3.079755 >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > ------------------------------------------- > Richard Williams, Notre Dame Dept of Sociology > OFFICE: (574)631-6668, (574)631-6463 > HOME: (574)289-5227 > EMAIL: Richard.A.Williams.5@ND.Edu > WWW: http://www.nd.edu/~rwilliam > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/