Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Ploutz-Snyder, Robert (JSC-SK)[USRA]" <robert.ploutz-snyder-1@nasa.gov> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Understanding Factor variables - is order significant ? |

Date |
Thu, 27 May 2010 08:42:48 -0500 |

Nick brings up sorting as an explanation of why to not pursue string variables as factor variables in Stata. If the factor variable represents an ordinal categorization, the analyst need merely modify his/her labels, just as now we do so by choosing which number represents the "first" category..etc. Following with Nick's example, if I wanted "low" to be first, I could code the values as A, B, C, and have the order that I desire. Far more common, I think, are factor variables that are nominal instead of ordinal. Male vs. Female, Trmt vs. Control, Drug vs. Drug+Therapy vs. Therapy vs. Control, Race and/or Ethnicity categories... Those sorts of factor variables are commonly used and should be allowed as factor vars in Stata (as they are in other highly respected Stats languages). I receive/import data coded as string variables all that time, and to have the ability to use string vars as factors would be a much welcomed improvement. Rob -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Thursday, May 27, 2010 8:20 AM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Understanding Factor variables - is order significant ? I just don't think this is going to happen, regardless of the attractions of Al's proposal. Here's one argument, which I guess is far from the least crucial: Suppose strings are permitted. Then Stata has to know what order to put them in, if only for display purposes. Clearly, Stata's idea will be that alphabetical order will be the obvious default. But then someone says, "No, that's not what I want, as I have L, M, H meaning low, medium and high, and clearly I want them in that order." Fair enough, but then Stata needs string labels, or whatever. Except that the whole argument can immediately be reversed. You can, long since, have numeric values with your own text labels, so that there is, from a Stata point of view, no need to visit this (very fundamental) change. Nick n.j.cox@durham.ac.uk Feiveson, Alan H. (JSC-SK311) This again raises the issue of why Stata insists that "factor " variables be numeric. With strings permitted as factor variables, Stata could internally assign whatever numbers it wanted to the levels, thus avoiding this confusion. Also a lot less bother for the user. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Understanding Factor variables - is order significant ?***From:*rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp)

**RE: st: Understanding Factor variables - is order significant ?***From:*"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>

**RE: st: Understanding Factor variables - is order significant ?***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**RE: st: Understanding Factor variables - is order significant ?** - Next by Date:
**Re: st: How to Correctly Structure a CSV before Loading it into STATA** - Previous by thread:
**RE: st: Understanding Factor variables - is order significant ?** - Next by thread:
**RE: st: Understanding Factor variables - is order significant ?** - Index(es):