Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: conditional SE of y|X in glm


From   Marco Ventura <mventura@istat.it>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: conditional SE of y|X in glm
Date   Tue, 24 Apr 2012 14:33:03 +0200

Thank you Nick,
could you please tell me what Stata does exactly for replicating e(dispers) when family is not normal?

Regards, Marco


Il 24/04/2012 13:29, Nick Cox ha scritto:
With this model, as with every other, you have to decide what you mean by "prediction", i.e. on what scale you are predicting.

Also, I did write

"I like to have such measures accessible for comparing -glm- results  with those of other models in which rmse appears naturally."

and I think logit models are stretching the point.

In essence, what -glmcorr- does in your example is either wrong or irrelevant, depending on your point of view. -glmcorr- can be reconciled with those results by doing instead

. gen fraction = r/n

. glm fraction ldose , link(logit)

Iteration 0:   log likelihood =   3.345982
Iteration 1:   log likelihood =  3.7166249
Iteration 2:   log likelihood =  3.7245648
Iteration 3:   log likelihood =   3.724566
Iteration 4:   log likelihood =   3.724566

Generalized linear models                          No. of obs      =        24
Optimization     : ML                              Residual df     =        22
                                                    Scale parameter =  .0468293
Deviance         =  1.030244611                    (1/df) Deviance =  .0468293
Pearson          =  1.030244611                    (1/df) Pearson  =  .0468293

Variance function: V(u) = 1                        [Gaussian]
Link function    : g(u) = ln(u/(1-u))              [Logit]

                                                    AIC             = -.1437138
Log likelihood   =  3.724566043                    BIC             = -68.88694

------------------------------------------------------------------------------
              |                 OIM
     fraction |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
        ldose |   22.43087   5.627079     3.99   0.000       11.402    33.45974
        _cons |  -40.34087   10.10823    -3.99   0.000    -60.15264   -20.52909
------------------------------------------------------------------------------

. glmcorr

     fraction and predicted

     Correlation          0.800
     R-squared            0.640
     Root MSE             0.216

. di sqrt(e(dispers))
.21640079

However, that would lose some of the information in the data.

Otherwise, -glmcorr- uses what -predict- produces by default; if that's wrong for your problem, so will the results be.

Nick
n.j.cox@durham.ac.uk


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Marco Ventura
Sent: 24 April 2012 10:29
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: conditional SE of y|X in glm

Dear Nick,
thank you very much of your quick replies.
Unfortunately there is something I still do not understand. If I do
use http://www.stata-press.com/data/r10/beetle
glm r ldose, fam(bin n) link (logit)
di sqrt(e(dispers))
glmcorr
I get two very different values 4.065 against 13.179. Which of the two
is correct?

Thank you again.
Marco

Il 24/04/2012 10:57, Nick Cox ha scritto:
See -glmcorr- (SSC) for one approach here. That calculates an rmse
which appears similar, if not identical, to what you want. I like to
have such measures accessible for comparing -glm- results  with those
of other models in which rmse appears naturally. Perhaps it is a
comfort blanket, but there you go.

Note that putting a constant into a variable is usually overkill as

di sqrt(e(dispers))

does the calculation. Use a scalar or local macro if you want to store
the value.

On Tue, Apr 24, 2012 at 9:31 AM, Marco Ventura<mventura@istat.it>   wrote:

from a GLM estimate I want to retrieve the conditional standard error of y
given the covariates. If I do

gen sigma=sqrt(e(dispers))

do I always get the right thing independently of any family and link?
Should I correct it by sqrt(e(dispers)* (_N-1)/_N)?
And do you think I should instead use the Pearson residuals such as

gen sigma=sqrt(e(dispers_p))

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index