Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: find categorical variables

From   David Hoaglin <[email protected]>
To   [email protected]
Subject   Re: st: find categorical variables
Date   Thu, 22 Mar 2012 06:22:38 -0400


In this situation (and in the binary vs. continuous discussion), the
decision should be based, first,  on a clear understanding of the
definition of the variable.  That stage it does not involve looking at
the data.  It involves understanding the "measurement process."

If a "continuous" variable takes too few values in a particular set of
data, it might be appropriate to treat it as an (ordered) categorical
variable.  In a regression-like model, that choice may depend on
whether the variable is the response or a predictor.

A similar consideration applies when the variable is a count.

Data that are naturally "continuous" or counts are sometimes collected
in categories.  Income is one common example.  Analysts sometimes use
the midpoint of the category, but that distorts the data by not
accounting for variation that would have been present if the data had
not been collected in categories.  Also, an open-ended top category
may require special treatment.

In building a regression model, when one has enough data, it may be
useful to turn a continuous variable into a detailed set of categories
and fit a separate coefficient for each category, so that the data can
guide the choice of functional form for that variable.

If the analyst has not understood the nature of all the variables,
what are the results worth?

David Hoaglin
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index