Nick brings up sorting as an explanation of why to not pursue string variables as factor variables in Stata. If the factor variable represents an ordinal categorization, the analyst need merely modify his/her labels, just as now we do so by choosing which number represents the "first" category..etc. Following with Nick's example, if I wanted "low" to be first, I could code the values as A, B, C, and have the order that I desire. Far more common, I think, are factor variables that are nominal instead of ordinal. Male vs. Female, Trmt vs. Control, Drug vs. Drug+Therapy vs. Therapy vs. Control, Race and/or Ethnicity categories... Those sorts of factor variables are commonly used and should be allowed as factor vars in Stata (as they are in other highly respected Stats languages). I receive/import data coded as string variables all that time, and to have the ability to use string vars as factors would be a much welcomed improvement. Rob -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Thursday, May 27, 2010 8:20 AM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Understanding Factor variables - is order significant ? I just don't think this is going to happen, regardless of the attractions of Al's proposal. Here's one argument, which I guess is far from the least crucial: Suppose strings are permitted. Then Stata has to know what order to put them in, if only for display purposes. Clearly, Stata's idea will be that alphabetical order will be the obvious default. But then someone says, "No, that's not what I want, as I have L, M, H meaning low, medium and high, and clearly I want them in that order." Fair enough, but then Stata needs string labels, or whatever. Except that the whole argument can immediately be reversed. You can, long since, have numeric values with your own text labels, so that there is, from a Stata point of view, no need to visit this (very fundamental) change. Nick n.j.cox@durham.ac.uk Feiveson, Alan H. (JSC-SK311) This again raises the issue of why Stata insists that "factor " variables be numeric. With strings permitted as factor variables, Stata could internally assign whatever numbers it wanted to the levels, thus avoiding this confusion. Also a lot less bother for the user. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

