[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Model selection using AIC/BIC and other information criteria

From	[email protected]
To	[email protected]
Subject	st: RE: Model selection using AIC/BIC and other information criteria
Date	Wed, 24 Jun 2009 11:47:12 -0400

Stata has two versions of AIC statistics, one used with -glm- andanother -estat ic- The -estat ic- version does not adjustthe log-likelihood and penalty term by the number of observations inthe model, whereas the version used in -glm- does.


ESTAT-IC

AIC = -2*LL + 2*k      = -2(LL-k)

GLM

AIC  = -2*LL + 2*k           -2(LL - k)
         ----------------    =    --------------
               n                         n

where LL is the model log-likelihood and k is the number of predictors.2k is a penalty term, adjusting for the numberof predictors in the model. Larger n affects -2LL. Dividing by nadjusts the statistic to yield a per observation contribution to theadjusted -2*LL. That is, the version used in -glm- adjusts for samplesize.

Note that -estat ic- uses a particular ersion of the BIC statistic thatis based on the LL. The original version proposed by raftery in 1986 isbased on the deviance. -glm- uses the orignal version - hence thedescrepancy in displayed values.

Regardless, for several of my publications I developed two programsthat calculate the AIC and BIC statistic folllowing a Stata maximumlikelihood or GLM command. Look at the difference in applying the twoversions of AIC when applied to a simple logistic regression


. use auto,clear
(1978 Automobile Data)

. glm foreign mpg length, nolog fam(bin)

Generalized linear models No. of obs= 74Optimization : ML Residualdf = 71Scale parameter = 1Deviance = 60.3449833 (1/df) Deviance =.8499293Pearson = 54.91238538 (1/df) Pearson =.7734139


Variance function: V(u) = u*(1-u)                            [Bernoulli]

Link function : g(u) = ln(u/(1-u))[Logit]

AIC= .8965538Log likelihood = -30.17249165 BIC= -245.2436

-------------------------------------------------------------------------
-----
            |                 OIM

foreign | Coef. Std. Err. z P>|z| [95% Conf.Interval]

-------------+-----------------------------------------------------------
-----

mpg | -.0988457 .0784404 -1.26 0.208 -.2525861.0548946length | -.1051447 .0295657 -3.56 0.000 -.1630923-.047197_cons | 20.43339 6.700286 3.05 0.002 7.30107233.56571

-------------------------------------------------------------------------
-----

. estat ic

-------------------------------------------------------------------------
----

Model | Obs ll(null) ll(model) df AICBIC

-------------+-----------------------------------------------------------
----

. | 74 . -30.17249 3 66.3449873.25718

-------------------------------------------------------------------------
----
              Note:  N=Obs used in calculating BIC; see [R] BIC note

. aic
AIC Statistic =   .8965538             AIC*n =  66.344983
BIC Statistic =  -245.2436

. abic
AIC Statistic   =   .8965538           AIC*n      = 66.344986
BIC Statistic   =   .9045494           BIC(Stata) = 73.257179

** -aic- calculates both versions of AIC, and the deviance basedBIC.Note that it is consistent

    to the displayed -glm- values

** -abic- gives the same two version of AIC, and the same BIC used by-estat ic-. The BICon the left side is that used in LIMDEP econometric software. Itadjusts for sample size as well


. expand 2
(74 observations created)

. glm foreign mpg length, nolog fam(bin)

Generalized linear models No. of obs= 148Optimization : ML Residualdf = 145Scale parameter = 1Deviance = 120.6899666 (1/df) Deviance =.8323446Pearson = 109.8247708 (1/df) Pearson =.7574122

Variance function: V(u) = u*(1-u)[Bernoulli]Link function : g(u) = ln(u/(1-u))[Logit]

AIC= .8560133Log likelihood = -60.3449833 BIC= -603.9058

-------------------------------------------------------------------------
-----
            |                 OIM

foreign | Coef. Std. Err. z P>|z| [95% Conf.Interval]

-------------+-----------------------------------------------------------
-----

mpg | -.0988457 .0554657 -1.78 0.075 -.2075566.0098651length | -.1051447 .0209061 -5.03 0.000 -.1461198-.0641695_cons | 20.43339 4.737818 4.31 0.000 11.1474429.71934

-------------------------------------------------------------------------
-----

. estat ic

-------------------------------------------------------------------------
----

Model | Obs ll(null) ll(model) df AICBIC

-------------+-----------------------------------------------------------
----

. | 148 . -60.34498 3 126.69135.6816

-------------------------------------------------------------------------
----
              Note:  N=Obs used in calculating BIC; see [R] BIC note

. aic
AIC Statistic =   .8560133             AIC*n =  126.68997
BIC Statistic =  -603.9058

. abic
AIC Statistic   =   .8560133           AIC*n      = 126.68996
BIC Statistic   =   .8600111           BIC(Stata) = 135.68161

***

Note the enlarged AIC statistic when using -estat ic- , but not whenusing theAIC used in -glm-. Also note the constancy of the Limdep BIC statisticwhen the

data was expanded.

By adjusting for the number of observations in the model, the AIC canbetterbe used as a comparative fit statistic, regardless if there is adifference in sample

sizes. This was the intent of the statistic in the first place.

Also be aware that there have been other versions of the AIC. Some arethefinite sample AIC, Swartz AIC, and Limdep AIC. Each of these has anexplicit

adjustment for sample size, unlike the version used in -estat ic-.

I discuss this topic in some detail in my new book, "LogisticRegression Models", and provide a tableof Degrees of Model Preference based on the difference in AIC valuesbetween 2 models. Thecriteria of strength of Preference is based on simulation studies. Thetable is similar to the

table developed by Raftery for his original version of BIC.

It must be understood that the penalty and observation corrections arenot completelysuccessful in eliminating bias resulting from additional predictors anddifferences in observations.But having an adjustment for sample size appears to me to preferablethan not. Othersdeveloping alternatives to the traditonal AIC statistics (estata ic andglm) seem to agree. Theprimary caveat to be aware of when using AIC (glm) relates to its usewith correlated data. But

that's another discussion.

Joseph Hilbe



=========================================

ate: Tue, 23 Jun 2009 22:20:36 -0500
From: Richard Williams <[email protected]>

Subject: RE: st: Model selection using AIC/BIC and other informationcriteria


At 08:39 PM 6/23/2009, kokootchke wrote:

Thank you, Richard. This was exactly what I thought... but I
remember from my metrics classes long time ago that both AIC and BIC
depend on N (sample size)... and I confirmed this by simply looking
at these wikipedia entries... but, just like you, I also feared
that, even though both criteria adjust for the sample size, maybe
you can't compare between AICs and BICs when the models use
different # of observations...

Here is a simple example that shows the sensitivity of BIC and AIC tosample size:


. sysuse auto, clear
(1978 Automobile Data)

. quietly reg  price mpg trunk weight

. estat ic

--------------------------------------------------------------------------

----

Model | Obs ll(null) ll(model) df AICBIC--------------+-----------------------------------------------------------

----

. | 74 -695.7129 -682.6073 4 1373.2151382.431--------------------------------------------------------------------------

----
              Note:  N=Obs used in calculating BIC; see [R] BIC note

. expand 2
(74 observations created)

. quietly reg  price mpg trunk weight

. estat ic

--------------------------------------------------------------------------

----

Model | Obs ll(null) ll(model) df AICBIC--------------+-----------------------------------------------------------

----

. | 148 -1391.426 -1365.215 4 2738.4292750.418--------------------------------------------------------------------------

----
              Note:  N=Obs used in calculating BIC; see [R] BIC note

So, even if data are missing at random with your X variable, the
smaller sample sizes that result from its inclusion will drive down
the BIC and AIC stats quite a bit.


- -------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: RE: st: AW: moulton factor correction
Next by Date: AW: st: AW: moulton factor correction
Previous by thread: st: STATISTICIAN JOB OPENING
Next by thread: st: confirm a variable is an identifiant
Index(es):
- Date
- Thread