[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten buis <maartenbuis@yahoo.co.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
RE: st: Interval variables as independent variables |

Date |
Mon, 10 Nov 2008 14:54:40 +0000 (GMT) |

--- "Jessica A.Jakubowski" asked: >>>> I have data that includes an income variable measured at the >>>> interval level, e.g.: >>>> 1= less than $10,000 >>>> 2= $10,000-$15,000 <snip> >>>> I would like to know if Stata has a way to deal with interval >>>> variables on the right-hand side of the regression equation, that >>>> is, as an independent variable. --- Maarten buis answered: >>> I can see two alternative strategies (there may well be more): <snip> >>> 2) Alternatively you can scale the income variable such that it >>> optimally predicts the outcome variable --- Austin Nichols answered: >> Usually it is better to make indicators, e.g. >> tab income, gen(d) >> reg y d* --- "Feiveson, Alan H. (JSC-SK311)" answered: > But suppose one is trying to make a prediction model from actual > income, not an income range? Then wouldn't some adjustment have to be > made for the predictor variable being measured with error? If so, how? Austin's answer is a special case of my point 2): you can think of that model as simultaneously estimating a scale for these categories and an effect of this scaled income. The scale defines the relative distances between the categories, such that the linear effect of this scaled income optimally predicts the outcome. Lets say we have three categories: poor, middle, and rich. In order to identify the scale we need to fix the origin and the unit of the scale. Lets say we fix the origin at poor and the unit at the distance between poor and rich. In that case our scale would measure the position of middle relative to poor and rich, and will most likely be a number between 0 and 1. Lets call this number "a". It is this a together with the effect of scaled income that we want to estimate. If we create dummies for poor medium and rich, than we can say that the scaled income variable would be: scaled_inc = 0 poor + a middle + 1 rich and the effect of of that variable on some dependent variable y is: y = b + c*scaled_inc = b + c*(0 poor + a middle + 1 rich) = b + c a middle + c rich If we entered income just as a set of dummies dummies (with poor as reference category) we would have gotten: y = b0 + b1 middle + b2 rich So we can directly derive both the scaling and the effect of scaled income from this regression model with income dummies: c = b_2 a = b1/b2 Below is an application using 4 categories: *-------------------- begin example ---------------------- sysuse auto, clear recode rep78 1=2 tab rep78, gen(d) reg mpg d2 d3 d4 foreign weight // the effect of scaled repair status is the effect of d4 // the scale values to each categories are: // d1 = 0 // d2: nlcom _b[d2]/_b[d4] // d3: nlcom _b[d3]/_b[d4] // d4 = 1 *---------------------- end example --------------------- You can see how this can be extended to more than 4 categories. An interesting extension here would occur if you have an interaction effect, e.g. with time. In that case you could enforce the constraint that the scaling of income remains constant over time, but that the effect changes, and this would be a testable constraint. This idea is implemented in -propcnsreg- and is discussed in Buis, Maarten L. (2008) "Scaling levels of education" http://home.fsw.vu.nl/m.buis/ . -- Maarten ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room N515 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: Interval variables as independent variables***From:*"Feiveson, Alan H. (JSC-SK311)" <Alan.H.Feiveson@nasa.gov>

- Prev by Date:
**Re: st: insheet delimiter problem** - Next by Date:
**RE: st: insheet delimiter problem** - Previous by thread:
**RE: st: Interval variables as independent variables** - Next by thread:
**st: about using spost.ado on stata 10** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |