# RE: st: Interval variables as independent variables

 From Maarten buis To statalist@hsphsun2.harvard.edu Subject RE: st: Interval variables as independent variables Date Mon, 10 Nov 2008 14:54:40 +0000 (GMT)

```--- "Jessica A.Jakubowski" asked:
>>>> I have data that includes an income variable measured at the
>>>> interval level, e.g.:
>>>> 1= less than \$10,000
>>>> 2= \$10,000-\$15,000
<snip>
>>>> I would like to know if Stata has a way to deal with interval
>>>> variables on the right-hand side of the regression equation, that
>>>> is, as an independent variable.

>>> I can see two alternative strategies (there may well be more):
<snip>
>>> 2) Alternatively you can scale the income variable such that it
>>> optimally predicts the outcome variable

>> Usually it is better to make indicators, e.g.
>> tab income, gen(d)
>> reg y d*

--- "Feiveson, Alan H. (JSC-SK311)" answered:
> But suppose one is trying to make a prediction model from actual
> income, not an income range? Then wouldn't some adjustment have to be
> made for the predictor variable being measured with error? If so,
how?

Austin's answer is a special case of my point 2): you can think of that
model as simultaneously estimating a scale for these categories and an
effect of this scaled income. The scale defines the relative distances
between the categories, such that the linear effect of this scaled
income optimally predicts the outcome.

Lets say we have three categories: poor, middle, and rich. In order to
identify the scale we need to fix the origin and the unit of the scale.
Lets say we fix the origin at poor and the unit at the distance between
poor and rich. In that case our scale would measure the position of
middle relative to poor and rich, and will most likely be a number
between 0 and 1. Lets call this number "a". It is this a together with
the effect of scaled income that we want to estimate. If we create
dummies for poor medium and rich, than we can say that the scaled
income variable would be:

scaled_inc = 0 poor + a middle + 1 rich

and the effect of of that variable on some dependent variable y is:

y = b + c*scaled_inc
= b + c*(0 poor + a middle + 1 rich)
= b + c a middle + c rich

If we entered income just as a set of dummies dummies (with poor as
reference category) we would have gotten:

y = b0 + b1 middle + b2 rich

So we can directly derive both the scaling and the effect of scaled
income from this regression model with income dummies:

c = b_2
a = b1/b2

Below is an application using 4 categories:

*-------------------- begin example ----------------------
sysuse auto, clear
recode rep78 1=2
tab rep78, gen(d)
reg mpg d2 d3 d4 foreign weight

// the effect of scaled repair status is the effect of d4
// the scale values to each categories are:
// d1 = 0
// d2:
nlcom _b[d2]/_b[d4]
// d3:
nlcom _b[d3]/_b[d4]
// d4 = 1
*---------------------- end example ---------------------

You can see how this can be extended to more than 4 categories. An
interesting extension here would occur if you have an interaction
effect, e.g. with time. In that case you could enforce the constraint
that the scaling of income remains constant over time, but that the
effect changes, and this would be a testable constraint. This idea is
implemented in -propcnsreg- and is discussed in Buis, Maarten L. (2008)
"Scaling levels of education" http://home.fsw.vu.nl/m.buis/ .

-- Maarten

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands

Buitenveldertselaan 3 (Metropolitan), room N515

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```