Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Model selection using AIC/BIC and other information criteria


From   kokootchke <kokootchke@hotmail.com>
To   statalist <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Model selection using AIC/BIC and other information criteria
Date   Tue, 23 Jun 2009 21:39:59 -0400

Thank you, Richard. This was exactly what I thought... but I remember from my metrics classes long time ago that both AIC and BIC depend on N (sample size)... and I confirmed this by simply looking at these wikipedia entries... but, just like you, I also feared that, even though both criteria adjust for the sample size, maybe you can't compare between AICs and BICs when the models use different # of observations... 

Anyway, I just wanted to make sure I wasn't missing something else... 

Thanks a lot!!
Adrian


----------------------------------------
> Date: Tue, 23 Jun 2009 21:20:28 -0500
> To: statalist@hsphsun2.harvard.edu; statalist@hsphsun2.harvard.edu
> From: Richard.A.Williams.5@ND.edu
> Subject: Re: st: Model selection using AIC/BIC and other information criteria
>
> At 06:07 PM 6/23/2009, kokootchke wrote:
>>Dear all,
>>
>>I have a model that says that the return or yield spread of a bond
>>issued by a country depends non-linearly on the country's
>>probability of default. If I assume that this probability of default
>>follows a logistic form, I get that the log spread depends linearly
>>on "stuff" which I take to be macroeconomic variables. To choose the
>>best model, I use AIC/BIC.
>>
>>One interesting fact I observe is that in some cases, I see that
>>both AIC and BIC select a model that contains some variable X even
>>when a lot of data points are missing for that particular variable,
>>which means I actually lose a lot of observations when I include
>>such variable X.
>>
>>More specifically, I have:
>>
>>MODEL 1
>>
>>regress log_spread a b c X
>>estat ic
>>
>>which gives AIC = 915
>>
>>then,
>>
>>MODEL 2
>>
>>regress log_spread a b c
>>estat ic
>>
>>which gives AIC = 1500
>>
>>but the OLS in model 1 uses 1200 observations while the OLS in model
>>2 uses 2800 observations (because 1600 observations are missing in
>>variable X)!!
>>
>>You would think that this would be because X is very relevant to
>>explain the spread, but in fact I see some cases when this variable
>>is statistically insignificant!!
>
> Somebody can correct me if I am wrong, but I don't think it is legit
> to compare BIC and AIC statistics that have been estimated on
> different samples. I don't think these stats are totally immune to
> differences in sample size -- and even if they were the two samples
> used might be very different, e.g. maybe those 1600 missing cases are
> all bonds from the US.
>
> I'm guessing a fairer comparison would be
>
> nestreg, lr: reg log_spread (a b c) X
>
> The same sample will be used for both regressions and you will get
> BIC and AIC stats at the end.
>
> I think your bigger concern, though, is losing more than half your
> cases when you include X. You need to find out why those data are
> missing and then decide what to do about it.
>
>
> -------------------------------------------
> Richard Williams, Notre Dame Dept of Sociology
> OFFICE: (574)631-6668, (574)631-6463
> HOME: (574)289-5227
> EMAIL: Richard.A.Williams.5@ND.Edu
> WWW: http://www.nd.edu/~rwilliam
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/statalist/faq
> * http://www.ats.ucla.edu/stat/stata/

_________________________________________________________________
Insert movie times and more without leaving Hotmail®.
http://windowslive.com/Tutorial/Hotmail/QuickAdd?ocid=TXT_TAGLM_WL_HM_Tutorial_QuickAdd_062009
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index