Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Statistical/Stata question


From   Richard Williams <Richard.A.Williams.5@nd.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Statistical/Stata question
Date   Tue, 17 Feb 2004 22:03:59 -0500

It isn't silly at all.

In effect, you are forcing the first two categories (1 and 2) to have exactly the same odds (probability) for the event. In other words, you are constraining the odds ratio for category2 vs. category1 to be exactly equal to 1. As a consequence, the comparison of category3 to category1 has the same odds ratio as the comparison of category3 to category2 (and similarly for category4). For the purposes of the model, there are only 3 categories: a combined 1/2 category, category3 and category4.
Agreed. To offer some substantive examples: When you have a categorical variable, you may not necessarily want to create and use all possible dummies. The Ns for some categories may be very small, or, for the purposes of your analysis, the differences between groups may be small or nonexistent.

Suppose, for example, that Race has been coded white, black and other in your data. You might wonder whether (a) you need separate dummies for blacks and others, because both groups differ from whites and they also differ from each other, or (b) do you just need the dummy for white, in which case you'd be saying that the important contrast is white versus nonwhite and the differences among nonwhites are not important.

Combining categories may cost you some information, and it may obscure important differences by in effect assuming that, say, all minorities are the same. But, combining categories can also make the analysis much more manageable and help the most critical points to be clearer. You can, of course, do formal tests to see whether it is legit to combine categories together, e.g. if the IVs were black and other and a test reveals that their effects do not significantly differ from each other, you could just use a dummy var for white/nonwhite instead. Or, going back to the original example, if the IVS were category2, category3, and category4, and category2 was not significant, that would suggest that you can simplify things by just pooling category1 and category2 together (especially if it makes logical sense that the groups would differ little if any from each other).


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index