[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Model selection using AIC/BIC and other information criteria

From	Richard Williams <[email protected]>
To	"[email protected]" <[email protected]>, statalist <[email protected]>
Subject	Re: st: Model selection using AIC/BIC and other information criteria
Date	Tue, 23 Jun 2009 21:20:28 -0500

At 06:07 PM 6/23/2009, kokootchke wrote:

Dear all,
I have a model that says that the return or yield spread of a bondissued by a country depends non-linearly on the country'sprobability of default. If I assume that this probability of defaultfollows a logistic form, I get that the log spread depends linearlyon "stuff" which I take to be macroeconomic variables. To choose thebest model, I use AIC/BIC.
One interesting fact I observe is that in some cases, I see thatboth AIC and BIC select a model that contains some variable X evenwhen a lot of data points are missing for that particular variable,which means I actually lose a lot of observations when I includesuch variable X.
More specifically, I have:

MODEL 1

regress log_spread a b c X
estat ic

which gives AIC = 915

then,

MODEL 2

regress log_spread a b c
estat ic

which gives AIC = 1500
but the OLS in model 1 uses 1200 observations while the OLS in model2 uses 2800 observations (because 1600 observations are missing invariable X)!!
You would think that this would be because X is very relevant toexplain the spread, but in fact I see some cases when this variableis statistically insignificant!!

Somebody can correct me if I am wrong, but I don't think it is legitto compare BIC and AIC statistics that have been estimated ondifferent samples. I don't think these stats are totally immune todifferences in sample size -- and even if they were the two samplesused might be very different, e.g. maybe those 1600 missing cases areall bonds from the US.


I'm guessing a fairer comparison would be

nestreg, lr: reg log_spread (a b c) X

The same sample will be used for both regressions and you will getBIC and AIC stats at the end.

I think your bigger concern, though, is losing more than half yourcases when you include X. You need to find out why those data aremissing and then decide what to do about it.



-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: Model selection using AIC/BIC and other information criteria
  - From: kokootchke <[email protected]>

References:
- st: Analyze a subpopulation of survey data in Stata 10.1
  - From: "Karadogan, Figen" <[email protected]>
- st: Model selection using AIC/BIC and other information criteria
  - From: kokootchke <[email protected]>

Prev by Date: st: Interpreting Poisson output
Next by Date: RE: st: Model selection using AIC/BIC and other information criteria
Previous by thread: st: Model selection using AIC/BIC and other information criteria
Next by thread: RE: st: Model selection using AIC/BIC and other information criteria
Index(es):
- Date
- Thread