Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Omit Constant from Count Models


From   Maarten Buis <[email protected]>
To   [email protected]
Subject   Re: st: Omit Constant from Count Models
Date   Fri, 14 Sep 2012 17:27:50 +0200

I think I gave you wrong advice. The predicted values are not supposed
to have the same distribution as the dependent variable as they by
definition exclude the random error. So you would want the
distribution of the predictions to be more tight than the actual
variable. It is hard to say which one fits better from these
distributions. Look at graphs of the predictions versus the
explanatory variables and you'll probably see that you do not want to
exclude the constant.

-- Maarten

On Fri, Sep 14, 2012 at 5:06 PM, Habiger, Matt <[email protected]> wrote:
> Thanks for the response Maarten. In this instance, all rhs variables are nonnegative so my statement is true in this instance but certainly in not generally true. Hence, when running the predict command no values are below the exponentiated constant. With that said, it is the case for this data set that dropping the constant leads to better predictions (see distributions below). Thanks for the article reference and comment on a priori evidence! I'll have to consider if there is any reasonable a priori argument for dropping the constant.
>
> Days    Actual       No constant                Constant
> 1       8%              2%              0%
> 2       13%             19%             0%
> 3       15%             21%             0%
> 4       16%             18%             2%
> 5       11%             13%             66%
> 6       8%              9%              25%
> 7       5%              6%              4%
> 8       3%              3%              2%
> 9       3%              2%              0%
> 10      3%              1%              0%
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Maarten Buis
> Sent: Friday, September 14, 2012 8:32 AM
> To: [email protected]
> Subject: Re: st: Omit Constant from Count Models
>
> On Fri, Sep 14, 2012 at 2:58 PM, Habiger, Matt  wrote:
>> I'm hoping somebody can inform me of what impact(s) omitting a constant term from count models, such as poisson or negative binomial, have? Does it impact t-statistics or the validity of coefficient estimates?
>>
>> I'm modeling the number of days a patient spends in a hospital for a given year and the constant is causing the predicted visits distribution to start at ~4 days (exp(1.46)). In the actual data, roughly 25% of days are below 4 (only those with visits are being modeled). When I drop the constant my estimates are much closer to resembling the actual distribution. Below are the outputs from two models for reference.
>>
>>
>> Truncated negative binomial regression            Number of obs   =       1334
>> Truncation point: 0                               LR chi2(5)      =      99.50
>> Dispersion     = mean                             Prob > chi2     =     0.0000
>> Log likelihood = -3639.7431                       Pseudo R2       =     0.0135
>>
>> ------------------------------------------------------------------------------
>>     inpunits |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
>> -------------+--------------------------------------------------------
>> -------------+--------
>>   claims2009 |   .0047266   .0020829     2.27   0.023     .0006442    .0088091
>> previnpunits |   .0290032    .007004     4.14   0.000     .0152755    .0427308
>>        age09 |   .0041179   .0016468     2.50   0.012     .0008903    .0073455
>>   inplow_ind |   .1627803    .080974     2.01   0.044     .0040741    .3214865
>> inphigh_ind |   .2904038    .078893     3.68   0.000     .1357764    .4450312
>>        _cons |   1.469507   .0654664    22.45   0.000     1.341195    1.597818
>> -------------+--------------------------------------------------------
>> -------------+--------
>>     /lnalpha |   -.418825   .0693098                     -.5546696   -.2829804
>> -------------+--------------------------------------------------------
>> -------------+--------
>>        alpha |   .6578193   .0455933                       .574262    .7535346
>> ----------------------------------------------------------------------
>> -------- Likelihood-ratio test of alpha=0:  chibar2(01) = 2745.11
>> Prob>=chibar2 = 0.000
>
> The statement that "the constant is causing the predicted visits distribution to start at ~4 days (exp(1.46))" is not quite true. Your results say that for a (hypothetical) observation with the value 0 on the variables claims2009, previnpunits, age09, inplow_ind, inphigh_ind you would predict that such an person would stay about 4 days in hospital. Depending on these variables, this can be a gross extrapolation.
>
> In general you do not want to leave the constant out. The idea that leaving the constant out will lead to better predictions is certainly wrong. But don't take my word for it, try it out: estimate the model with and without the constant, use -predict- to predict the expected days in hospital for each of these models and plot both against the observed days in hospital.
>
> As always there are exceptions. I think the most common valid reason for leaving out the constant is when you put it back in through the backdoor by the way you enter categorical variables, e.g.: M.L. Buis
> (2012) "Stata tip 106: With or without reference", The Stata Journal, 12(1), pp. 162-164. You may also be modeling a physical process, where there is very strong a priori evidence that there can be no constant.
> But any process involving humans is just too random to fall in that class of models.
>
> Hope this helps,
> Maarten
>
> ---------------------------------
> Maarten L. Buis
> WZB
> Reichpietschufer 50
> 10785 Berlin
> Germany
>
> http://www.maartenbuis.nl
> ---------------------------------
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> ---
> CONFIDENTIALITY NOTICE: This email message and any attachments are for the sole use of the intended recipient(s) and may contain proprietary, confidential, trade secret or privileged information. Any unauthorized review, use, disclosure or distribution is prohibited and may be a violation of law.  If you are not the intended recipient or a person responsible for delivering this message to an intended recipient, please contact the sender by reply email and destroy all copies of the original message.
> ---
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index