Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: conditional SE of y|X in glm


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: conditional SE of y|X in glm
Date   Tue, 24 Apr 2012 12:29:20 +0100

With this model, as with every other, you have to decide what you mean by "prediction", i.e. on what scale you are predicting. 

Also, I did write 

"I like to have such measures accessible for comparing -glm- results  with those of other models in which rmse appears naturally."

and I think logit models are stretching the point. 

In essence, what -glmcorr- does in your example is either wrong or irrelevant, depending on your point of view. -glmcorr- can be reconciled with those results by doing instead 

. gen fraction = r/n

. glm fraction ldose , link(logit)

Iteration 0:   log likelihood =   3.345982  
Iteration 1:   log likelihood =  3.7166249  
Iteration 2:   log likelihood =  3.7245648  
Iteration 3:   log likelihood =   3.724566  
Iteration 4:   log likelihood =   3.724566  

Generalized linear models                          No. of obs      =        24
Optimization     : ML                              Residual df     =        22
                                                   Scale parameter =  .0468293
Deviance         =  1.030244611                    (1/df) Deviance =  .0468293
Pearson          =  1.030244611                    (1/df) Pearson  =  .0468293

Variance function: V(u) = 1                        [Gaussian]
Link function    : g(u) = ln(u/(1-u))              [Logit]

                                                   AIC             = -.1437138
Log likelihood   =  3.724566043                    BIC             = -68.88694

------------------------------------------------------------------------------
             |                 OIM
    fraction |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       ldose |   22.43087   5.627079     3.99   0.000       11.402    33.45974
       _cons |  -40.34087   10.10823    -3.99   0.000    -60.15264   -20.52909
------------------------------------------------------------------------------

. glmcorr

    fraction and predicted

    Correlation          0.800
    R-squared            0.640
    Root MSE             0.216

. di sqrt(e(dispers))
.21640079

However, that would lose some of the information in the data. 

Otherwise, -glmcorr- uses what -predict- produces by default; if that's wrong for your problem, so will the results be. 

Nick 
n.j.cox@durham.ac.uk 


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Marco Ventura
Sent: 24 April 2012 10:29
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: conditional SE of y|X in glm

Dear Nick,
thank you very much of your quick replies.
Unfortunately there is something I still do not understand. If I do
use http://www.stata-press.com/data/r10/beetle
glm r ldose, fam(bin n) link (logit)
di sqrt(e(dispers))
glmcorr
I get two very different values 4.065 against 13.179. Which of the two 
is correct?

Thank you again.
Marco

Il 24/04/2012 10:57, Nick Cox ha scritto:
> See -glmcorr- (SSC) for one approach here. That calculates an rmse
> which appears similar, if not identical, to what you want. I like to
> have such measures accessible for comparing -glm- results  with those
> of other models in which rmse appears naturally. Perhaps it is a
> comfort blanket, but there you go.
>
> Note that putting a constant into a variable is usually overkill as
>
> di sqrt(e(dispers))
>
> does the calculation. Use a scalar or local macro if you want to store
> the value.
>
> On Tue, Apr 24, 2012 at 9:31 AM, Marco Ventura<mventura@istat.it>  wrote:
>
>> from a GLM estimate I want to retrieve the conditional standard error of y
>> given the covariates. If I do
>>
>> gen sigma=sqrt(e(dispers))
>>
>> do I always get the right thing independently of any family and link?
>> Should I correct it by sqrt(e(dispers)* (_N-1)/_N)?
>> And do you think I should instead use the Pearson residuals such as
>>
>> gen sigma=sqrt(e(dispers_p))
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index