Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Understanding Factor variables - is order significant ?

From	"Feiveson, Alan H. (JSC-SK311)" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: Understanding Factor variables - is order significant ?
Date	Thu, 27 May 2010 08:07:17 -0500

This again raises the issue of why Stata insists that "factor " variables be numeric. With strings permitted as factor variables, Stata could internally assign whatever numbers it wanted to the levels, thus avoiding this confusion. Also a lot less bother for the user.

Al Feiveson

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Roberto G. Gutierrez, StataCorp
Sent: Wednesday, May 26, 2010 3:17 PM
To: [email protected]
Subject: Re: st: Understanding Factor variables - is order significant ?

Jesper Lindhardsen <[email protected]> writes:

> I am having a hard time understanding why 2 regression models that differ
> only by the "order" of the included factor variables yield different
> results???  I can't (or am too slow to) find the answer in the
> documentation, but I think it is related to the parsing of the baselevel
> specifiers (see model 1 legend = _b[0o.ra#0b.dm] ???).

> Here are the 2 commands and resulting output - as you can see I've only
> changed b1.ra#b0.dm to b0.dm#b1.ra. Output has been edited, but only left
> out if identical between models.

Jesper (and those others who have contributed on this thread) have discovered
a bug in how factor-variable interactions are being parsed in Stata.  The
specific conditions that trigger this are as follows:

   1. You specify a simple interaction (a single # sign) between two or more
      factor variables.

   2. The first variable in the interaction has the value zero as one of
      its categories.

   3. The first specification in the interaction has a base level that is 
      not the default of zero (the lowest level for the first variable).

   4. At least one of the remaining variables in the interaction has a base
      equal to the lowest-valued category for that variable, whether expicitly
      specified or taken as the default.

   5. Almost all estimation commands are affected by this bug, with -regress-
      being one notable exception.

When the above conditions occur, Stata is attempting to omit an extra cell in
the interaction.  Sometimes, the cell will be omitted altogether, other times
Stata will produce a coefficient for that cell, but missing standard errors
and confidence intervals.  Either way, the model fit is thrown off because the
cell's coefficient is not properly estimated.

We will fix this in the next executable update, to be made available soon.

--Bobby						--Jeff
[email protected]				[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: Understanding Factor variables - is order significant ?
  - From: "Nick Cox" <[email protected]>

References:
- Re: st: Understanding Factor variables - is order significant ?
  - From: [email protected] (Roberto G. Gutierrez, StataCorp)

Prev by Date: st: AW: Using regex to identify strings with capital letters
Next by Date: st: RE: Using regex to identify strings with capital letters
Previous by thread: RE: st: Understanding Factor variables - is order significant ?
Next by thread: RE: st: Understanding Factor variables - is order significant ?
Index(es):
- Date
- Thread