Dear all,
I have a model that says that the return or yield spread of a bond issued by a country depends non-linearly on the country's probability of default. If I assume that this probability of default follows a logistic form, I get that the log spread depends linearly on "stuff" which I take to be macroeconomic variables. To choose the best model, I use AIC/BIC.
One interesting fact I observe is that in some cases, I see that both AIC and BIC select a model that contains some variable X even when a lot of data points are missing for that particular variable, which means I actually lose a lot of observations when I include such variable X.
More specifically, I have:
MODEL 1
regress log_spread a b c X
estat ic
which gives AIC = 915
then,
MODEL 2
regress log_spread a b c
estat ic
which gives AIC = 1500
but the OLS in model 1 uses 1200 observations while the OLS in model 2 uses 2800 observations (because 1600 observations are missing in variable X)!!
You would think that this would be because X is very relevant to explain the spread, but in fact I see some cases when this variable is statistically insignificant!!
Can any of you explain this?
Alternatively, could you tell me whether there are any other useful stats I could look at?
Thank you very much!
Best,
Adrian
_________________________________________________________________
Bing™ brings you maps, menus, and reviews organized in one place. Try it now.
http://www.bing.com/search?q=restaurants&form=MLOGEN&publ=WLHMTAG&crea=TEXT_MLOGEN_Core_tagline_local_1x1
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/