 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: conditional SE of y|X in glm

 From Marco Ventura To statalist@hsphsun2.harvard.edu Subject Re: st: conditional SE of y|X in glm Date Tue, 24 Apr 2012 14:33:03 +0200

```Thank you Nick,
```
could you please tell me what Stata does exactly for replicating e(dispers) when family is not normal?
```
Regards, Marco

Il 24/04/2012 13:29, Nick Cox ha scritto:
```
```With this model, as with every other, you have to decide what you mean by "prediction", i.e. on what scale you are predicting.

Also, I did write

"I like to have such measures accessible for comparing -glm- results  with those of other models in which rmse appears naturally."

and I think logit models are stretching the point.

In essence, what -glmcorr- does in your example is either wrong or irrelevant, depending on your point of view. -glmcorr- can be reconciled with those results by doing instead

. gen fraction = r/n

. glm fraction ldose , link(logit)

Iteration 0:   log likelihood =   3.345982
Iteration 1:   log likelihood =  3.7166249
Iteration 2:   log likelihood =  3.7245648
Iteration 3:   log likelihood =   3.724566
Iteration 4:   log likelihood =   3.724566

Generalized linear models                          No. of obs      =        24
Optimization     : ML                              Residual df     =        22
Scale parameter =  .0468293
Deviance         =  1.030244611                    (1/df) Deviance =  .0468293
Pearson          =  1.030244611                    (1/df) Pearson  =  .0468293

Variance function: V(u) = 1                        [Gaussian]
Link function    : g(u) = ln(u/(1-u))              [Logit]

AIC             = -.1437138
Log likelihood   =  3.724566043                    BIC             = -68.88694

------------------------------------------------------------------------------
|                 OIM
fraction |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
ldose |   22.43087   5.627079     3.99   0.000       11.402    33.45974
_cons |  -40.34087   10.10823    -3.99   0.000    -60.15264   -20.52909
------------------------------------------------------------------------------

. glmcorr

fraction and predicted

Correlation          0.800
R-squared            0.640
Root MSE             0.216

. di sqrt(e(dispers))
.21640079

However, that would lose some of the information in the data.

Otherwise, -glmcorr- uses what -predict- produces by default; if that's wrong for your problem, so will the results be.

Nick
n.j.cox@durham.ac.uk

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Marco Ventura
Sent: 24 April 2012 10:29
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: conditional SE of y|X in glm

Dear Nick,
thank you very much of your quick replies.
Unfortunately there is something I still do not understand. If I do
use http://www.stata-press.com/data/r10/beetle
glm r ldose, fam(bin n) link (logit)
di sqrt(e(dispers))
glmcorr
I get two very different values 4.065 against 13.179. Which of the two
is correct?

Thank you again.
Marco

Il 24/04/2012 10:57, Nick Cox ha scritto:
```
```See -glmcorr- (SSC) for one approach here. That calculates an rmse
which appears similar, if not identical, to what you want. I like to
have such measures accessible for comparing -glm- results  with those
of other models in which rmse appears naturally. Perhaps it is a
comfort blanket, but there you go.

Note that putting a constant into a variable is usually overkill as

di sqrt(e(dispers))

does the calculation. Use a scalar or local macro if you want to store
the value.

On Tue, Apr 24, 2012 at 9:31 AM, Marco Ventura<mventura@istat.it>   wrote:

```
```from a GLM estimate I want to retrieve the conditional standard error of y
given the covariates. If I do

gen sigma=sqrt(e(dispers))

do I always get the right thing independently of any family and link?
Should I correct it by sqrt(e(dispers)* (_N-1)/_N)?
And do you think I should instead use the Pearson residuals such as

gen sigma=sqrt(e(dispers_p))

```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```