Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Understanding Factor variables - is order significant ? |
Date | Thu, 27 May 2010 08:07:17 -0500 |
This again raises the issue of why Stata insists that "factor " variables be numeric. With strings permitted as factor variables, Stata could internally assign whatever numbers it wanted to the levels, thus avoiding this confusion. Also a lot less bother for the user. Al Feiveson -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roberto G. Gutierrez, StataCorp Sent: Wednesday, May 26, 2010 3:17 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Understanding Factor variables - is order significant ? Jesper Lindhardsen <JESLIN01@geh.regionh.dk> writes: > I am having a hard time understanding why 2 regression models that differ > only by the "order" of the included factor variables yield different > results??? I can't (or am too slow to) find the answer in the > documentation, but I think it is related to the parsing of the baselevel > specifiers (see model 1 legend = _b[0o.ra#0b.dm] ???). > Here are the 2 commands and resulting output - as you can see I've only > changed b1.ra#b0.dm to b0.dm#b1.ra. Output has been edited, but only left > out if identical between models. Jesper (and those others who have contributed on this thread) have discovered a bug in how factor-variable interactions are being parsed in Stata. The specific conditions that trigger this are as follows: 1. You specify a simple interaction (a single # sign) between two or more factor variables. 2. The first variable in the interaction has the value zero as one of its categories. 3. The first specification in the interaction has a base level that is not the default of zero (the lowest level for the first variable). 4. At least one of the remaining variables in the interaction has a base equal to the lowest-valued category for that variable, whether expicitly specified or taken as the default. 5. Almost all estimation commands are affected by this bug, with -regress- being one notable exception. When the above conditions occur, Stata is attempting to omit an extra cell in the interaction. Sometimes, the cell will be omitted altogether, other times Stata will produce a coefficient for that cell, but missing standard errors and confidence intervals. Either way, the model fit is thrown off because the cell's coefficient is not properly estimated. We will fix this in the next executable update, to be made available soon. --Bobby --Jeff rgutierrez@stata.com jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/