Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏

From	Joerg Luedicke <[email protected]>
To	[email protected]
Subject	st: Re: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
Date	Thu, 4 Oct 2012 21:52:03 -0500

I find it difficult to understand why you would regard a variable as
censored, when it actually isn't?

Let's assume the outcome variable (y) is income and the predictor
variable (x) is years of education. We generate some data for which
the expected income for people without education is 500 and, on
average, persons earn 300 more per year of education:
*-------------------
clear
set obs 1000
set seed 1234
gen x=rnormal(10,3)
gen e=rnormal(0,20)
gen y=500+300*x+e
*-------------------

Fitting a linear model to these data yields the expected parameters:
*-------------------
reg y x
*-------------------

Now suppose income was only measured exactly for amounts of 3,000 or
more, so in this case y is censored from below at a value of 3,000:
*-------------------
gen cy=y
replace cy=3000 if y<3000
*-------------------

If we fit the simple linear model to these data now, the results are
obviously bad:
*-------------------
reg cy x
*-------------------

However, if we use the Tobit model, we can again recover the correct parameters:
*-------------------
tobit cy x, ll(3000)
*-------------------

So the Tobit model makes a lot of sense here and seems useful in an
otherwise possibly unpleasant situation, given the censored outcome.
However, if an outcome is simply bounded at zero, like for example
expenditure data, then such variables are not censored: a zero is just
a zero; not more and not less. So why would it be advisable to use a
censored regression model when the outcome is not censored? For me,
that would only make sense if, say, the model shares some other hidden
qualities and generally does well when analyzing bounded data. But
this does not even seem to be the case if we consider Austin Nichols'
(2010) simulation results for nonnegative skewed data.

Joerg


References:

Nichols , A, 2010. Regression for nonnegative skewed dependent
variables, BOS10 Stata Conference 2, Stata Users Group.
URL: http://repec.org/bost10/nichols_boston2010.pdf



On Thu, Oct 4, 2012 at 8:08 PM, Millimet, Daniel <[email protected]> wrote:
> Yes, in my opinion, if you include the zeros, a fractional logit or tobit or censored LAD is appropriate (given the other assumptions implicit in these models).  The only issue is whether some Xs are missing for the zeros.  That you will have to confront yourself if you have Xs you want to include that are missing from some obs.
>
> ****************************************************
> Daniel L. Millimet, Professor
> Department of Economics
> Box 0496
> SMU
> Dallas, TX 75275-0496
> phone: 214.768.3269
> fax: 214.768.1821
> web: http://faculty.smu.edu/millimet
> ****************************************************
>
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Ebru Ozturk
> Sent: Thursday, October 04, 2012 5:32 PM
> To: [email protected]
> Subject: RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
>
> Thank you. It will be quite complicated for me to understand this e-mail.
>
> Yes, in my data there is a mass at zero and I include all of them. So you are saying that it is a censoring problem and tobit regression is applicable or a fractional logit model?
>
> The other issue about Xs. The Xs that I am interested in have not been observed for non-innovator firms but there are other Xs that I use them as control variable have been observed for all firms in the sample.
>
> Ebru
>
> ----------------------------------------
>> From: [email protected]
>> To: [email protected]
>> Subject: RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
>> Date: Thu, 4 Oct 2012 22:16:57 +0000
>>
>> If you include all firms in a model, with a mass at zero, then is the standard censoring problem. Labor supply models are classic model. Labor supply has a "natural" lower bound at zero, but one does not use OLS. Typically, tobit models are used or semiparametric alternatives like censored LAD or symmetric trimmed least squares. See, for example, Wilhelm (OBES, 2008, "Practical Considerations for Choosing Between Tobit and SCLS or CLAD Estimators for Censored Regression Models with an Application to Charitable Giving"). For percentages, even though these variables are by definition between 0 and 1 (or 100), a fractional logit is the most common model, I believe, if there is a mass at either boundary point.
>>
>> So, in your case, if you include the zeros, yes it is a censoring problem.
>>
>> Th next issue is what Xs you observe for different observations. If all Xs were observed for all obs (0 and positive values), then a fractional logit is the answer (or a tobit or one of the above alternatives). If SOME of the Xs are missing for the obs at zero, then you can (i) drop the zeros and estimate a selection-corrected OLS model - if you ignore the upper limit of 100 - or you can combine the selection correction with a fractional logit/probit model, as long as you are sure the control function term for the correction is correct (this is what some empirical trade papers do when they drop country pairs with zero trade; although it is not recommended), or (ii) include the zeros, but you need two different equations for the zeros and the non-zeros since it sounded like not all Xs are available for the obs at zero. So, something like a hurdle (zero-inflated) model tailored to your example.
>>
>> **********************************************
>> Daniel L. Millimet, Professor
>> Department of Economics
>> Box 0496
>> SMU
>> Dallas, TX 75275-0496
>> phone: 214.768.3269
>> fax: 214.768.1821
>> web: http://faculty.smu.edu/millimet
>> **********************************************
>>
>> ________________________________________
>> From: [email protected] [[email protected]] on behalf of Ebru Ozturk [[email protected]]
>> Sent: Thursday, October 04, 2012 4:53 PM
>> To: [email protected]
>> Subject: RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
>>
>> Innovation success is heavily left-censored - many firms do not have any market novelties and thus no sales from this type of innovation (Grimpe & Kaiser, 2010).
>>
>> Is that wrong then?
>>
>> I'm really confused now.
>>
>> Ebru
>>
>> ----------------------------------------
>> > Date: Thu, 4 Oct 2012 16:45:59 -0500
>> > Subject: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
>> > From: [email protected]
>> > To: [email protected]
>> >
>> > On Thu, Oct 4, 2012 at 4:34 PM, Ebru Ozturk <[email protected]> wrote:
>> > > For Tobit regression, the dependent variable is the percent of total firm sales revenues that derived from the sales of new products. Therefore, it is censored as sales of new products can only be zero or positive.
>> > >
>> > This just isn't a censoring problem. Consider having a look at:
>> >
>> > http://en.wikipedia.org/wiki/Censoring_%28statistics%29
>> >
>> > Joerg
>> > *
>> > * For searches and help try:
>> > * http://www.stata.com/help.cgi?search
>> > * http://www.stata.com/support/faqs/resources/statalist-faq/
>> > * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: Re: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: "Millimet, Daniel" <[email protected]>

References:
- st: Truncated sample or Heckman selection‏
  - From: Ebru Ozturk <[email protected]>
- st: RE: Truncated sample or Heckman selection‏
  - From: "Millimet, Daniel" <[email protected]>
- st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: Nick Cox <[email protected]>
- RE: st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: Ebru Ozturk <[email protected]>
- st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: Joerg Luedicke <[email protected]>
- RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: Ebru Ozturk <[email protected]>
- RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: "Millimet, Daniel" <[email protected]>
- RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: Ebru Ozturk <[email protected]>
- RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
  - From: "Millimet, Daniel" <[email protected]>

Prev by Date: st: Re: Unable to clear "invalid syntax r(197);" error in user-written .ado file
Next by Date: Re: st: Re: Unable to clear "invalid syntax r(197);" error in user-written .ado file
Previous by thread: RE: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
Next by thread: st: RE: Re: st: Re: st: Re: st: RE: Truncated sample or Heckman selection‏
Index(es):
- Date
- Thread