Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Quantile Regression

 From "JVerkuilen (Gmail)" To statalist@hsphsun2.harvard.edu Subject Re: st: Quantile Regression Date Tue, 2 Oct 2012 22:12:59 -0400

```On Tue, Oct 2, 2012 at 7:31 PM, Steve Samuels <sjsamuels@gmail.com> wrote:
>
> Without details (see FAQ 3.3 first sentence), we can only guess. This
> could happen if 1) you did not set the same random seed before each
> -sqreg- and -bsqreg- command; 2) the number of bootstrap replicates
> differed between -sqreg- and -bsqreg- runs; or 3) -sqreg-  does not
> rejects replicates in which convergence failed for any quantile.

If the standard errors are different it's no great surprise if you're
running bootstrap. All the stuff said makes sense. Check on a known
dataset (such as auto) and fix the seed.

> By the way, the manual states that -sqreg- is faster than -bsqreg-.

I believe that computationally there are some speedups due to the fact
that the linear program can be solved for one and simply updated to
get the rest of the quantiles, but I could be mistaken. Roger
Koenker's book (Quantile Regression, Oxford University Press, 2006)
discusses computation in detail. Also there are analytic options to
bootstrapping that might be much faster. -qreg- generates standard
errors analytically using a weighting matrix and density estimator of
the residuals.

. sysuse auto
. qreg price mpg

Median regression                                    Number of obs =        74
Raw sum of deviations   142205 (about 4934)
Min sum of deviations 129521.7                     Pseudo R2     =    0.0892

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |  -135.6667   67.26576    -2.02   0.047    -269.7585   -1.574816
_cons |   8088.667   1483.808     5.45   0.000     5130.749    11046.58
------------------------------------------------------------------------------

. bsqreg price mpg, reps(999)              *note that bsqreg defaults
to 20!?!?!?!

Median regression, bootstrap(999) SEs                Number of obs =        74
Raw sum of deviations   142205 (about 4934)
Min sum of deviations 129521.7                     Pseudo R2     =    0.0892

------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
mpg |  -135.6667   35.63527    -3.81   0.000    -206.7043   -64.62906
_cons |   8088.667   889.0486     9.10   0.000     6316.381    9860.953
------------------------------------------------------------------------------

In this case the standard errors are markedly different and playing
with the different methods in -qreg- gives quite different values, but
I don't really know enough to be able to comment on why. I am inclined
to trust the bootstrapped ones because this problem has a rather small
N.

I suspect that it is very slow on a huge problem though, given that it
needs to sort the residuals. Koenker did a good deal of work on
alternatives such as inverting a test of some sort; I think the R
implementation of quantile regression has this. Again see his book.

> I've never had the luxury of having so many observations to analyze. I
> imagine that almost every simple model can be rejected, so that model
> building and validation are real challenges.

Randomly subsample and do a real cross validation?

Jay
--
JVVerkuilen, PhD
jvverkuilen@gmail.com

"Out beyond ideas of wrong-doing and right-doing there is a field.
I'll meet you there. When the soul lies down in that grass the world
is too full to talk about." ---Rumi
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```