Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Changing base level doesnt affect co-efficient values in other parts of logistic regression |

Date |
Mon, 12 Nov 2012 10:54:25 +0100 |

On Mon, Nov 12, 2012 at 9:54 AM, Tim Evans wrote: > I'm running a logistical regression model in Stata 11.2 to assess the likelihood of presenting early with a disease and have a number of independent categorical variables such as age, deprivation, region of residence and co-morbidity. Currently my default base level for the age group is the age group which has the largest number of observation (for instance, 60-69), which I understand to be a norm for Stata (and perhaps other programs). On reflection I thought that my model would make more sense if I set the base level to the specific age group of interest (for instance 50-59 using the char omit code). I was surprised that when I ran the two models, the only coefficients to change in the two models were the age group ones. I at least thought that there might be some movement in the coefficients of the other variables between the two models - why is this not the case? Say we have age in three categories 40-49, 50-59, 60-69. If we choose 60-69 as the reference category than the exponentiated constant is the baseline odds of getting the disease early and the exponentiated coefficients for the indicator variable for 40-59 is the ratio by which the odds for people aged 40-49 differs from the odds for people age 60-69. Similarly the exponentiated coefficient for the indicator variable for 50-59 compares the odds for age category 50-59 with age category 60-69. With the baseline odds, the odds ratio for 40-49 and the odds ratio for 50-59 you can compute the odds of getting the disease early for all three age-categories. If we chose 50-59 as our reference category, we would get a baseline odds of getting the disease early for people aged 50-59 and the odds ratios now compare 40-49 with 50-59 and 60-69 with 50-59. So those coefficients should change, but substantively the model remains exactly the same: with the baseline odds and the two odds ratios we can compute the odds of getting the disease early for all three age categories, and these odds will be exactly the same as the odds computed using a model with 60-69 as the reference category. This is why none of your other coefficients changed. For all they care you adjusted for age measured in three categories, so nothing changed when you changed the reference category. This will change when you also include interactions with age. Below is an example that illustrates this. Notice that the predicted odds in all three models are exactly the same, just as the odds ratios for goodjob and c_grade. Only the baseline and the odds ratios for race change when the reference level is changed. For more on the baseline odds see: R. Newson (2003) "Stata tip 1: The eform() option of regress", The Stata Journal, 3(4), pp. 445. M.L. Buis (2012) "Stata tip 107: The baseline is now reported", The Stata Journal, 12(1), pp. 165-166. For more on the trick of leaving out the reference category see: M.L. Buis (2012) "Stata tip 106: With or without reference", The Stata Journal, 12(1), pp. 162-164 *---------------------- begin example ---------------------- // data preparation sysuse nlsw88, clear gen byte goodjob = occupation < 3 if occupation < . gen int c_grade = grade - 12 // trick to report the baseline odds in Stata < 12 gen byte baseline = 1 // reference = white logit union i.race goodjob c_grade baseline, or nocons // predict odds for someone with a bad job and high school margins i.race, /// at(goodjob=0 c_grade=0 baseline=1) /// expression(exp(xb())) // reference = black logit union ib2.race goodjob c_grade baseline, or nocons // predict odds for someone with a bad job and high school margins i.race, /// at(goodjob=0 c_grade=0 baseline=1) /// expression(exp(xb())) // no reference, so need to leave out the constant/baseline // The "odds ratios" for race are now actually odds logit union ibn.race goodjob c_grade, or nocons *---------------------- begin example ---------------------- (For more on examples I sent to the Statalist see: http://www.maartenbuis.nl/example_faq ) Hope this helps, Maarten --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany http://www.maartenbuis.nl --------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Changing base level doesnt affect co-efficient values in other parts of logistic regression***From:*Tim Evans <Tim.Evans@wmciu.nhs.uk>

- Prev by Date:
**st: Import dbf data files to Stata11 MP** - Next by Date:
**st: RE: Changing base level doesnt affect co-efficient values in other parts of logistic regression** - Previous by thread:
**st: Changing base level doesnt affect co-efficient values in other parts of logistic regression** - Next by thread:
**RE: st: Changing base level doesnt affect co-efficient values in other parts of logistic regression** - Index(es):