Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: handling of discrete variables

From   Richard Williams <[email protected]>
To   [email protected]
Subject   Re: st: handling of discrete variables
Date   Mon, 20 Oct 2003 09:12:52 -0500

At 03:41 PM 10/20/2003 +0200, Thomas M�hlmann wrote:
Suppose we have two discrete variables (y and x) with each three
categories (coded 0, 1 and 2). This results in a 3x3 contingency tabel
with m=9 different cells. Now, it seems to me, that I have two
possibilities to incorporate x and y in regression like analysis:

1) Use four dummy variables, two for x and two for y (additional we can
use interaction terms)
2) Use one dummy for each of the m-1=8 cells of the contingency tabel.
My inclination would be to go with option 1. The problem with option 2 is that it potentially confounds the effects of variables. Suppose, for example, that the vars are race and religion, and that religion has significant effects but race does not. Approach 1 can pick that up but in approach 2 the effects of race and religion get muddled together. Or, suppose that the main effects of race and religion are significant but the interaction effects are not. With approach 1, you can run tests that will show you the interaction effects should not be in there, but with approach 2, interaction and main effects again get muddled together. Even if all effects are significant, approach 1 is more informative in that it separates out the main effects and the interaction effects of the variables. RW

Richard Williams, Associate Professor
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: [email protected]
WWW (personal):
WWW (department):

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index