Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Independent variable censoring

 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject Re: st: Independent variable censoring Date Mon, 4 Mar 2013 10:25:44 +0100

```> On Sun, Mar 3, 2013 at 9:48 PM, Nikos Kakouros wrote:
>> This may be a silly question (I should prefix all my stata posts with
>> that) but what is the best way to handle a heavily left-censored
>> predictor (independent) variable in a linear regression model?

On Sun, Mar 3, 2013 at 10:26 PM, Maarten Buis wrote:
> The reason for that is that the distribution of the independent
> variables is even less relevant than the distribution of the dependent
> variable. What you need to take care of is whether the effect of your
> independent variable is linear. It may make sense to add your variable
> linearly together with an indicator variable for whether that variable
> was censored or not. In essence this adds a "jump" in your regression
> line at the point of censoring.

Here is an example of how I would do that. Notice that this is not a
censored independent variable, but an independent variable that for
other reasons has one "special" value we want to take into account.
This reinforces the point that there is nothing special about a
censored independent variable.

*------------------ begin example ------------------
sysuse nlsw88, clear

gen byte black = race == 2 if race < 3
label variable black "race"
label define black 0 "white" ///
1 "black"
label value black black

// 40 hours per week might be a tiny bit special:
spikeplot hours

gen byte fulltime = hours == 40 if hours < .

glm wage black union grade hours fulltime , ///
// there is an upwards jump in wage of about 5% at 40 hours a week

// here is how I would graph the results:
preserve
keep if e(sample)
bys hours : keep if _n == 1
replace black = 0
replace union = 0

predict wagehat, mu

replace fulltime = 0
predict empty if hours == 40, mu

twoway line wagehat hours if hours != 40,          ///
lcolor(black) lpattern(solid)  ||   ///
rspike empty wagehat hours if hours == 40 , ///
lcolor(black) lpattern(solid) ||    ///
scatter wagehat hours if hours == 40,       ///
msymbol(O) mcolor(black) ||         ///
scatter empty hours if hours == 40,         ///
msymbol(O) mfcolor(white)           ///
mlcolor(black) legend(off)
restore
*------------------- end example -------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```