Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Interpretation of categorical independent variable

 From Maarten buis To statalist@hsphsun2.harvard.edu Subject Re: st: Interpretation of categorical independent variable Date Fri, 10 Sep 2010 08:23:52 +0000 (GMT)

--- On Fri, 10/9/10, Meng Zhao wrote:
> I use a three-category ID to predict a binomial dependent
> variable. I included dummy variables for the first two
> categories in the model. The result is:
>
>             Odds Ratio
>    p
> Category 1:  116.45
> 0.000
> Category 2:  17.76
>    0.000
>
> Is the following interpretion correct?
>
> 1.compared to category 3, being category 1 increases the
> Odds Ratio by 116.45 for DV to happen (whatever it
> represents).
>
> 2.compared to category 3, being category 2 increases the
> Odds Ratio by 17.76.
>
> 3.So category 1 has a stronger effect on DV than category
> 2, and category 2 is stronger than category 3

Not quite, an odds ratio is a ratio of odds, while the way you
formulated the results suggests that it is a difference. So
the odds a the expected number of successes for every failure,
and the odds ratio is the ratio by which this odds changes.

When interpreting the odds ratios, I find it helpful to have
the baseline odds. Unfortunately, Stata supresses this by
default, but there is a trick you can use to get it displayed,
which I learned from (Newson 2003).

Consider the example below:

*--------------- begin example ------------------
sysuse auto, clear
recode rep78 1/2=3

gen byte baseline = 1

sum price if !missing(foreign, rep78), meanonly
gen c_price = price - r(mean)

logit foreign i.rep78 c_price baseline, noconst or
*--------------- end example ----------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

The coefficient reported for baseline is the odds of
being foreign when one belongs to category 3 of rep78
and one has an average price (I created c_price to be
0 when the price is average). So for this type of car
we expect to find 0.08 foreign car for every domestic
car. This odds of being a foreing car changes by a
factor 12 (i.e. (12 - 1)*100% = 1100%) when the car
belongs to category 4, and by a factor 56 (i.e. 5500%)
when the car belongs to category 5.

To get more feeling for what that means I often find it
useful to look at the odds directly. To get these
you can leave the baseline category in your model and
leave the constant (in our case the "variable"
baseline) out. In this case the coefficients of your
categories are now the odds of being a foreign car
within each category for an average priced car.

So for category 3 we already knew that that was
0.08 foreign cars for every domestic car.

For category 4 cars the odds is 1 foreign car for every
domestic car (which is fortunately 12 times larger than
the odds for category 3 cars, so we are getting exactly
the same results as in our previous model).

For category 5 cars we expect to find 4.5 foreign cars
for every domestic car (which is 56 times larger than
the odds for category 3 cars).

*-------------- begin example -----------------
logit foreign ibn.rep78 c_price, noconst or
di exp(_b[4.rep78])/exp(_b[3bn.rep78])
di exp(_b[5.rep78])/exp(_b[3bn.rep78])
*---------------- end example ------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

Roger Newson (2003) "Stata tip 1: The eform() option of
regress". The Stata Journal, 3(4): 445.
<http://www.stata-journal.com/article.html?article=st0054>

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/