Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Independent variable censoring


From   Maarten Buis <maartenlbuis@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Independent variable censoring
Date   Mon, 4 Mar 2013 10:25:44 +0100

> On Sun, Mar 3, 2013 at 9:48 PM, Nikos Kakouros wrote:
>> This may be a silly question (I should prefix all my stata posts with
>> that) but what is the best way to handle a heavily left-censored
>> predictor (independent) variable in a linear regression model?

On Sun, Mar 3, 2013 at 10:26 PM, Maarten Buis wrote:
> The reason for that is that the distribution of the independent
> variables is even less relevant than the distribution of the dependent
> variable. What you need to take care of is whether the effect of your
> independent variable is linear. It may make sense to add your variable
> linearly together with an indicator variable for whether that variable
> was censored or not. In essence this adds a "jump" in your regression
> line at the point of censoring.

Here is an example of how I would do that. Notice that this is not a
censored independent variable, but an independent variable that for
other reasons has one "special" value we want to take into account.
This reinforces the point that there is nothing special about a
censored independent variable.

*------------------ begin example ------------------
sysuse nlsw88, clear

gen byte black = race == 2 if race < 3
label variable black "race"
label define black 0 "white" ///
                   1 "black"
label value black black

// 40 hours per week might be a tiny bit special:
spikeplot hours

gen byte fulltime = hours == 40 if hours < .

glm wage black union grade hours fulltime , ///
    link(log) vce(robust) eform
// there is an upwards jump in wage of about 5% at 40 hours a week

// here is how I would graph the results:	
preserve
keep if e(sample)
bys hours : keep if _n == 1
replace black = 0
replace union = 0
replace grade = 12

predict wagehat, mu

replace fulltime = 0
predict empty if hours == 40, mu

twoway line wagehat hours if hours != 40,          ///
               lcolor(black) lpattern(solid)  ||   ///
       rspike empty wagehat hours if hours == 40 , ///
               lcolor(black) lpattern(solid) ||    ///
       scatter wagehat hours if hours == 40,       ///
               msymbol(O) mcolor(black) ||         ///
	   scatter empty hours if hours == 40,         ///
               msymbol(O) mfcolor(white)           ///
               mlcolor(black) legend(off)
restore	
*------------------- end example -------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

Hope this helps,
Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index