Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Understanding Factor variables - is order significant ?


From   "Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Understanding Factor variables - is order significant ?
Date   Thu, 27 May 2010 08:07:17 -0500

This again raises the issue of why Stata insists that "factor " variables be numeric. With strings permitted as factor variables, Stata could internally assign whatever numbers it wanted to the levels, thus avoiding this confusion. Also a lot less bother for the user.


Al Feiveson



-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Roberto G. Gutierrez, StataCorp
Sent: Wednesday, May 26, 2010 3:17 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Understanding Factor variables - is order significant ?

Jesper Lindhardsen <JESLIN01@geh.regionh.dk> writes:

> I am having a hard time understanding why 2 regression models that differ
> only by the "order" of the included factor variables yield different
> results???  I can't (or am too slow to) find the answer in the
> documentation, but I think it is related to the parsing of the baselevel
> specifiers (see model 1 legend = _b[0o.ra#0b.dm] ???).

> Here are the 2 commands and resulting output - as you can see I've only
> changed b1.ra#b0.dm to b0.dm#b1.ra. Output has been edited, but only left
> out if identical between models.

Jesper (and those others who have contributed on this thread) have discovered
a bug in how factor-variable interactions are being parsed in Stata.  The
specific conditions that trigger this are as follows:

   1. You specify a simple interaction (a single # sign) between two or more
      factor variables.

   2. The first variable in the interaction has the value zero as one of
      its categories.

   3. The first specification in the interaction has a base level that is 
      not the default of zero (the lowest level for the first variable).

   4. At least one of the remaining variables in the interaction has a base
      equal to the lowest-valued category for that variable, whether expicitly
      specified or taken as the default.

   5. Almost all estimation commands are affected by this bug, with -regress-
      being one notable exception.

When the above conditions occur, Stata is attempting to omit an extra cell in
the interaction.  Sometimes, the cell will be omitted altogether, other times
Stata will produce a coefficient for that cell, but missing standard errors
and confidence intervals.  Either way, the model fit is thrown off because the
cell's coefficient is not properly estimated.

We will fix this in the next executable update, to be made available soon.

--Bobby						--Jeff
rgutierrez@stata.com				jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index