Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: How to plot bootstrap CI for the entire kernel estimation of y on x?

From   "Nick Cox" <>
To   <>
Subject   st: RE: Re: How to plot bootstrap CI for the entire kernel estimation of y on x?
Date   Sun, 6 Feb 2005 17:21:30 -0000

Eik Leong Swee

> Firstly, thanks so much for the reply. I'm not sure what is 
> the difference between kernreg2 and
> locpoly.

I am not sure why you are not sure what the difference is, 
as a comparison of the files and using the programs 
should make this clear. For example, -kernreg2- is 
a program for Stata 6, while -locpoly- is a program 
for Stata 8, so the associated graphics are quite
different. As earlier indicated, -kernreg2- was intended to 
be a temporary fix by myself to -kernreg-. That fix
was made in March 1999, but the authors of -kernreg- 
have yet to get round to publishing a revised version 
of their program, despite a variety of public and private 
requests. For Stata 8 users, that is now immaterial,
as -locpoly- supersedes -kernreg-. For any Stata 6 and Stata 7 
users, there remains an issue. I have been tempted to 
withdraw -kernreg2-, but that would mean that -kernreg- 
would remain in the public domain, although known to possess
bugs, yet without an alternative. 

> My theoretical understanding of kernel estimation (y on x) is 
> a locally weighted averaging (using
> a prespecified kernel function eg. normal or epanechnikov) 
> method of fit where the bandwidth is
> simply a measure of applying weights to distant observations. 
> The optimal bandwidth is chosen to
> minimise the mean itegrated squared error or so-called cross 
> validation (CV).
> Given the above, would you suggest I use kernreg2 or locpoly? 
> Is the optimal bandwidth chosen in
> each case using CV?

I suggest neither. I think you should tell us what kind 
of assumptions you are making about the error around 
whatever smooth curve you are fitting or, more generally,
why you think a binary response is suitable for this 
kind of application. 

Neither program makes any use of cross-validation. If they 
did, that would be clear in the documentation. 
Cross-validation would require some extra programming 
on somebody's part. But any kind of optimisation would seem 
beside the point unless you can justify your application
as appropriate. Optimising a qualitatively incorrect model
would seem a somewhat bizarre exercise. 

This is not to say that some kind of kernel regression 
might not provide a useful exploratory or heuristic 
approach to smoothing your response as a function of 
your predictor. In practice, it might work quite well. 
But I am not clear that the idea of averaging across a 
binary response is quite the best way to approach your problem. That's 
a lack of clarity on my part, and open to correction
from people with stronger technical grasp of this 
> Another question is regarding graphing the kernel estimates 
> and bootstrap confidence intervals. I
> have seen in some journals where kernel regressions (y on x) 
> were used and bootstrap CI were
> plotted around the kernel estimates. I encountered 3 problems 
> here. Firstly, I could not save
> kernreg graphs like I could with scatter plots. Secondly, I 
> know how to calculate bootstrap CI but
> dont know how to plot them on a graph. Lastly, how do I plot 
> both together on one graph?

Your problems here are not indicated precisely. Perhaps 
you should start by stating which version of Stata you 
are using. If you are using Stata 8, -kernreg*- is, 
as stated, superseded. If you are using -kernreg2-, 
you should indicate precisely what you did. If you are using 
-kernreg-, that is against my strong advice, as indicated. 
Nick Cox 

> > -kernreg2-, of which I am notionally first
> > author, was intended to be a temporary fix
> > of -kernreg-, written by other people.
> >
> > It didn't turn out that way, but no matter:
> > -locpoly- is now the recommended command,
> > in my view. In short, -kernreg2- is history,
> > except that it remains in the archives out
> > of inertia and for people still on earlier
> > versions of Stata.
> >
> > However, both of them stop a long way short
> > of offering this kind of functionality.
> >
> > Having said that, my own personal view is
> > that kernel regression is not obviously
> > the best thing for summarising how a
> > binary response varies with a predictor.
> > I can't offer more positive advice because
> > I am unclear on how far your problem is
> > tractable at all.

Eik Leong Swee

> > > I am trying to do a kernel density estimation of a y ( a 
> 0-1 variable)
> > > on x1. This generates Graph1. I also did an estimation on 
> y on x2 and
> > > generated graph2. I used kernreg2 for both these estimations.
> > >
> > > Now, I would also like to bootstrap confidence intervals 
> around the
> > > graph and subsequently test the two distributions from 
> graph 1 and 2
> > > (to see if they are statistically different in the 
> relevant range) .
> > > Unfortunately, kernreg2 does not give the non-parametric standard
> > > errors. I tried bootstrapping nevertheless, and this is the output
> > > that I get.
> > > Bootstrap statistics
> > >
> > > Variable | Reps Observed Bias Std. Err. [95% Conf. Interval]
> > > ---------+----------------------------------------------------
> > > ---------------
> > > klnpce | 100 10.69125 .5342394 .9190264 8.867703 12.5148 (N)
> > > | 9.449879 13.2954 (P)
> > > | 9.095177 11.76517 (BC)
> > > --------------------------------------------------------------
> > > ---------------
> > > N = normal, P = percentile, BC = bias-corrected
> > >
> > >
> > > First I would like to draw confidence intervals for the entire
> > > function, and then bootstrap the confidence intervals and 
> am not sure
> > > how to do it. I was wondering if anyone had faced this 
> problem, and
> > > could help me out.

*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index