Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Changing base level doesnt affect co-efficient values in other parts of logistic regression


From   Maarten Buis <maartenlbuis@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Changing base level doesnt affect co-efficient values in other parts of logistic regression
Date   Mon, 12 Nov 2012 10:54:25 +0100

On Mon, Nov 12, 2012 at 9:54 AM, Tim Evans wrote:
> I'm running a logistical regression model in Stata 11.2 to assess the likelihood of presenting early with a disease and have a number of independent categorical variables such as age, deprivation, region of residence and co-morbidity. Currently my default base level for the age group is the age group which has the largest number of observation (for instance, 60-69), which I understand to be a norm for Stata (and perhaps other programs). On reflection I thought that my model would make more sense if I set the base level to the specific age group of interest (for instance 50-59 using the char omit code). I was surprised that when I ran the two models, the only coefficients to change in the two models were the age group ones. I at least thought that there might be some movement in the coefficients of the other variables between the two models - why is this not the case?

Say we have age in three categories 40-49, 50-59, 60-69. If we choose
60-69 as the reference category than the exponentiated constant is the
baseline odds of getting the disease early and the exponentiated
coefficients for the indicator variable for 40-59 is the ratio by
which the odds for people aged 40-49 differs from the odds for people
age 60-69. Similarly the exponentiated coefficient for the indicator
variable for 50-59 compares the odds for age category 50-59 with age
category 60-69. With the baseline odds, the odds ratio for 40-49 and
the odds ratio for 50-59 you can compute the odds of getting the
disease early for all three age-categories.

If we chose 50-59 as our reference category, we would get a baseline
odds of getting the disease early for people aged 50-59 and the odds
ratios now compare 40-49 with 50-59 and 60-69 with 50-59. So those
coefficients should change, but substantively the model remains
exactly the same: with the baseline odds and the two odds ratios we
can compute the odds of getting the disease early for all three age
categories, and these odds will be exactly the same as the odds
computed using a model with 60-69 as the reference category.

This is why none of your other coefficients changed. For all they care
you adjusted for age measured in three categories, so nothing changed
when you changed the reference category. This will change when you
also include interactions with age.

Below is an example that illustrates this. Notice that the predicted
odds in all three models are exactly the same, just as the odds ratios
for goodjob and c_grade. Only the baseline and the odds ratios for
race change when the reference level is changed.

For more on the baseline odds see:
R. Newson (2003) "Stata tip 1: The eform() option of regress", The
Stata Journal, 3(4), pp. 445.
M.L. Buis (2012) "Stata tip 107: The baseline is now reported", The
Stata Journal, 12(1), pp. 165-166.

For more on the trick of leaving out the reference category see:
M.L. Buis (2012) "Stata tip 106: With or without reference", The Stata
Journal, 12(1), pp. 162-164

*---------------------- begin example ----------------------
// data preparation
sysuse nlsw88, clear
gen byte goodjob = occupation < 3 if occupation < .
gen int  c_grade = grade - 12

// trick to report the baseline odds in Stata < 12
gen byte baseline = 1

// reference = white
logit union i.race goodjob c_grade baseline, or nocons
// predict odds for someone with a bad job and high school
margins i.race,                        ///
    at(goodjob=0 c_grade=0 baseline=1) ///
    expression(exp(xb()))

// reference = black
logit union ib2.race goodjob c_grade baseline, or nocons
// predict odds for someone with a bad job and high school
margins i.race,                        ///
    at(goodjob=0 c_grade=0 baseline=1) ///
    expression(exp(xb()))

// no reference, so need to leave out the constant/baseline
// The "odds ratios" for race are now actually odds
logit union ibn.race goodjob c_grade, or nocons
*---------------------- begin example ----------------------
(For more on examples I sent to the Statalist see:
 http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index