Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Understanding Factor variables - is order significant ? |

Date |
Thu, 27 May 2010 08:07:17 -0500 |

This again raises the issue of why Stata insists that "factor " variables be numeric. With strings permitted as factor variables, Stata could internally assign whatever numbers it wanted to the levels, thus avoiding this confusion. Also a lot less bother for the user. Al Feiveson -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roberto G. Gutierrez, StataCorp Sent: Wednesday, May 26, 2010 3:17 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Understanding Factor variables - is order significant ? Jesper Lindhardsen <JESLIN01@geh.regionh.dk> writes: > I am having a hard time understanding why 2 regression models that differ > only by the "order" of the included factor variables yield different > results??? I can't (or am too slow to) find the answer in the > documentation, but I think it is related to the parsing of the baselevel > specifiers (see model 1 legend = _b[0o.ra#0b.dm] ???). > Here are the 2 commands and resulting output - as you can see I've only > changed b1.ra#b0.dm to b0.dm#b1.ra. Output has been edited, but only left > out if identical between models. Jesper (and those others who have contributed on this thread) have discovered a bug in how factor-variable interactions are being parsed in Stata. The specific conditions that trigger this are as follows: 1. You specify a simple interaction (a single # sign) between two or more factor variables. 2. The first variable in the interaction has the value zero as one of its categories. 3. The first specification in the interaction has a base level that is not the default of zero (the lowest level for the first variable). 4. At least one of the remaining variables in the interaction has a base equal to the lowest-valued category for that variable, whether expicitly specified or taken as the default. 5. Almost all estimation commands are affected by this bug, with -regress- being one notable exception. When the above conditions occur, Stata is attempting to omit an extra cell in the interaction. Sometimes, the cell will be omitted altogether, other times Stata will produce a coefficient for that cell, but missing standard errors and confidence intervals. Either way, the model fit is thrown off because the cell's coefficient is not properly estimated. We will fix this in the next executable update, to be made available soon. --Bobby --Jeff rgutierrez@stata.com jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: Understanding Factor variables - is order significant ?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**Re: st: Understanding Factor variables - is order significant ?***From:*rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp)

- Prev by Date:
**st: AW: Using regex to identify strings with capital letters** - Next by Date:
**st: RE: Using regex to identify strings with capital letters** - Previous by thread:
**RE: st: Understanding Factor variables - is order significant ?** - Next by thread:
**RE: st: Understanding Factor variables - is order significant ?** - Index(es):