Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: IV and quantile regression


From   "Stas Kolenikov" <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: IV and quantile regression
Date   Wed, 6 Aug 2008 23:42:59 -0500

The problem is not coming up with the code that does this (which is 6
lines of code if my mental programming is OK: two lines for -program-
and -end-, one line to estimate the first stage, one line to predict
something out of it, one line to run the second stage, and one line to
embed this into the -bootstrap- procedure). The problem is with the
very approach. Those are not "limitations", those are inconcistencies
of the estimates, not just the standard errors -- what you will get
out of those 6 lines will likely have nothing to do with the true
values of the coefficients.

First, the generic instrumental variables estimators are based on a
very specific assumption that instruments and errors are uncorrelated.
This idea is in fact fits very well into the whole geometry of least
squares as projections. You get the estimating equations of the form

sum over i of (y_i - x_i beta) * z_i = 0

which you can solve and get the neat matrix expressions for 2SLS IV
estimators. That the resulting expressions have a two-stage
interpretation is a happy coincidence of the above mentioned geometry,
and generalizes to no other case.

How can you think about the instrumental variables condition for the
quantile regressions? Well probably you would say, the instruments and
errors are independent (which is quite a bit stronger assumptions than
being uncorrelated). Hence, their functions are independent as well.
But how can you use that in your estimating equations? Even for the
original quantile regression problem, you have something quite nasty:
for the quantile of level r,

sum over i of [ - r + indicator of the event {y_i - x_i beta < 0} ] = 0

(if my algebra is not failing me). Is there a way to stick instruments
in there? If you put z_i inside (to produce something like { (y_i-x_i
beta)z_i < 0}, you do not have iid terms in the curly brackets any
more. Even if you had homoskedastic residuals (which the linear
regression IV did not need, by the way), the variances will be sigma
times z_i^2. In any occasion, you cannot say much about those
probabilities on which estimation is based. Things kinda fall apart.

That's why Austin said that there is no particulary good way of going
about that. Dismissing his comments is not cool! There's indeed been
some work on quantile regressions with endogeneous variables, but as
far as I can recall it, you need panel data to conduct your
estimation, and it is based on entirely different notions of
identifiability of your effects through some unwieldy conditional
distributions. I never had an opportunity to figure this out better...
meaning, I did not really have any data where I had an urgent need to
apply it (luckily :)). It might be in Koenker's recent book on
quantile regression. And I think I attended a talk by Andrew Chesher
on this. Those are the minimal suggestions to look up.

Second, I would have some doubts with the bootstrap even in absence of
endogeneity, as the estimator you are bootstrapping is not smooth. The
bootstrap method says, "Look, here are two distribution functions
close to one another: the underlying unknown one in population, and
the sample distribution. May be the results of computing some
statistic will be close to those two distributions?" You are putting
your hopes on this "maybe" part, but those hopes are not justified
when your estimator deals with jumps (as in case of the quantile
regression estimator). If things happen in jumps, there is no
guarantee your bootstrap distribution won't hop away from the
underlying one you are interested in. For the results of sampling from
those two distributions to be close, you need your estimator to change
smoothly as you go from one distribution to the other. Quantiles are
standard counterexamples where the simple version of the bootstrap
does not work. What you need to do is to resample m << n (m much less
than n) observations, and scale your distribution using m rather than
n in appropriate places. If you had the asymptotic normality of the
kind "sqrt(n) (hat beta - beta_0) converges to N(0,some variance)",
then you need to scale your resampled distribution of by sqrt(n/m).
Stata may or may not take care of that -- I would expect it does, but
that needs to be verified with [R] bootstrap probably.

I might be wrong on this second part though. What I vaguely remember
being discussed regarding the quantile regression models and the lack
of smoothness is that it sorta disappears in high dimensions with many
regressors. There are some geometric considerations of how this
distance to origin function looks in different dimensions. It is
V-shaped in single dimension, but it starts looking U-shaped in higher
dimensions. So it might or might not bite -- but again, it needs to be
checked with appropriate statistical and econometric literature.

On Wed, Aug 6, 2008 at 11:45 AM, Matteo Cominetta
<m.p.cominetta@sussex.ac.uk> wrote:
> Dear Austin,
> I'm well aware of the limitations of the approach proposed by Sachin
> (i.e. inconsistent estimates of the coefficients' s.e.).
> I got in touch with Stata's technical support and they suggested to run
> the 1st stage manually, obtain the fitted values and then bootstrap the
> (quantile regression) 2nd stage with the fitted values. They also sent
> me some examples, which I'm studying at the moment.
> I'll forward them and all the material shortly.
>
>
> Matteo
>
>
> Quoting Austin Nichols <austinnichols@gmail.com>:
>
>> Matteo Cominetta and
>> Sachin Chintawar <sachintalks@gmail.com>:
>>
>> Usually this naive approach (sticking in a predicted value for the
>> endog var) gives biased and inconsistent estimates (e.g. when
>> applied
>> to -probit- or -poisson- or -glm- or what have you).  I believe that
>> some work has been done on an IV version of quantile regression, but
>> there is no Stata command that I know of, and there is no consensus
>> in
>> the literature that I am familiar with on how it should be
>> implemented. So I don't think it is possible in Stata as of today,
>> but
>> I would be happy to be proved wrong.
>>
>> On 8/5/08, Sachin Chintawar <sachintalks@gmail.com> wrote:
>> > Hi
>> > Though I did not find a command for a two stage quantile
>> regression, One way
>> > to go through this problem is to run your first stage regression
>> and use the
>> > predicted values of the first stage in the second stage quantile
>> regression.
>> > I hope this helps. More comments on using such a technique would be
>> useful.
>> >
>> >
>> > Sachin Chintawar
>> > Research Assistant
>> > Dept. of Agricultural Economics
>> > Louisiana State University
>> > -----Original Message-----
>> > From: owner-statalist@hsphsun2.harvard.edu
>> > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Matteo
>> Cominetta
>> > Sent: Thursday, July 31, 2008 12:34 PM
>> > To: statalist@hsphsun2.harvard.edu
>> > Subject: st: IV and quantile regression
>> >
>> > Dear All,
>> > I need to estimate a quantile regression model in which some
>> > (continuous) regressors are endogenous. Is there a Stata command
>> to
>> > perform a 2SLS procedure in which the second stage is a quantile
>> > regression?
>> > I couldn't find it.
>> > Thanks!
>> >
>> > Matteo Cominetta
>> > Sussex University
>> *


-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index