Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Understanding Factor variables - is order significant ?

From   "Ploutz-Snyder, Robert (JSC-SK)[USRA]" <>
To   "" <>
Subject   RE: st: Understanding Factor variables - is order significant ?
Date   Thu, 27 May 2010 08:42:48 -0500

Nick brings up sorting as an explanation of why to not pursue string variables as factor variables in Stata.

If the factor variable represents an ordinal categorization, the analyst need merely modify his/her labels, just as now we do so by choosing which number represents the "first" category..etc. Following with Nick's example, if I wanted "low" to be first, I could code the values as A, B, C, and have the order that I desire.  

Far more common, I think, are factor variables that are nominal instead of ordinal.  Male vs. Female, Trmt vs. Control, Drug vs. Drug+Therapy vs. Therapy vs. Control, Race and/or Ethnicity categories...  Those sorts of factor variables are commonly used and should be allowed as factor vars in Stata (as they are in other highly respected Stats languages).

I receive/import data coded as string variables all that time, and to have the ability to use string vars as factors would be a much welcomed improvement.


-----Original Message-----
From: [] On Behalf Of Nick Cox
Sent: Thursday, May 27, 2010 8:20 AM
Subject: RE: st: Understanding Factor variables - is order significant ?

I just don't think this is going to happen, regardless of the
attractions of Al's proposal. 

Here's one argument, which I guess is far from the least crucial: 

Suppose strings are permitted. Then Stata has to know what order to put
them in, if only for display purposes. Clearly, Stata's idea will be
that alphabetical order will be the obvious default. But then someone
says, "No, that's not what I want, as I have L, M, H meaning low, medium
and high, and clearly I want them in that order." Fair enough, but then
Stata needs string labels, or whatever. Except that the whole argument
can immediately be reversed. You can, long since, have numeric values
with your own text labels, so that there is, from a Stata point of view,
no need to visit this (very fundamental) change.  


Feiveson, Alan H. (JSC-SK311)

This again raises the issue of why Stata insists that "factor "
variables be numeric. With strings permitted as factor variables, Stata
could internally assign whatever numbers it wanted to the levels, thus
avoiding this confusion. Also a lot less bother for the user.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index