Dear Nick, thanks for the reply. I am adopting a semi-parametric strategy here. First, the
dependent variable y is binary (0,1). There are a few regressors x- one of them x1, of particular
interest, is suspiciously non-linear (theorectically and empirically).
The strategy is then to run a parametric probit model of y on x (except for x1), then regress the
fitted residuals on x1 using kernel regression. This way, the fitted residuals are not necessarily
binary like y, but they make up the component of y that is not explained by x , and could
potentially be explained by x1. This is where the kernel regression comes in. I'm not sure if it
is convincing enough for you. I am not assuming anything about the errors in the kernel
regression, as I plan to do a bootstrap of the errors. No need for asymptotic theory or CLT here.
One huge problem I had was finding the program locpoly. It is not on my Stata 8, and I could not
find it on the web until couple of days ago. That is why I could not compare it to kernreg nor
find out if CV was included in locpoly. Apologies for making you repeat yourself. Also... maybe
now you have a better idea to answer my 2nd question:
> Another question is regarding graphing the kernel estimates
> and bootstrap confidence intervals. I
> have seen in some journals where kernel regressions (y on x)
> were used and bootstrap CI were
> plotted around the kernel estimates. I encountered 3 problems
> here. Firstly, I could not save
> kernreg graphs like I could with scatter plots. Secondly, I
> know how to calculate bootstrap CI but
> dont know how to plot them on a graph. Lastly, how do I plot
> both together on one graph?
Thanks a million!
> "Nick Cox" wrote:
> > > Firstly, thanks so much for the reply. I'm not sure what is
> > > the difference between kernreg2 and
> > > locpoly.
> > I am not sure why you are not sure what the difference is,
> > as a comparison of the files and using the programs
> > should make this clear. For example, -kernreg2- is
> > a program for Stata 6, while -locpoly- is a program
> > for Stata 8, so the associated graphics are quite
> > different. As earlier indicated, -kernreg2- was intended to
> > be a temporary fix by myself to -kernreg-. That fix
> > was made in March 1999, but the authors of -kernreg-
> > have yet to get round to publishing a revised version
> > of their program, despite a variety of public and private
> > requests. For Stata 8 users, that is now immaterial,
> > as -locpoly- supersedes -kernreg-. For any Stata 6 and Stata 7
> > users, there remains an issue. I have been tempted to
> > withdraw -kernreg2-, but that would mean that -kernreg-
> > would remain in the public domain, although known to possess
> > bugs, yet without an alternative.
> > > My theoretical understanding of kernel estimation (y on x) is
> > > a locally weighted averaging (using
> > > a prespecified kernel function eg. normal or epanechnikov)
> > > method of fit where the bandwidth is
> > > simply a measure of applying weights to distant observations.
> > > The optimal bandwidth is chosen to
> > > minimise the mean itegrated squared error or so-called cross
> > > validation (CV).
> > >
> > > Given the above, would you suggest I use kernreg2 or locpoly?
> > > Is the optimal bandwidth chosen in
> > > each case using CV?
> > I suggest neither. I think you should tell us what kind
> > of assumptions you are making about the error around
> > whatever smooth curve you are fitting or, more generally,
> > why you think a binary response is suitable for this
> > kind of application.
> > Neither program makes any use of cross-validation. If they
> > did, that would be clear in the documentation.
> > Cross-validation would require some extra programming
> > on somebody's part. But any kind of optimisation would seem
> > beside the point unless you can justify your application
> > as appropriate. Optimising a qualitatively incorrect model
> > would seem a somewhat bizarre exercise.
> > This is not to say that some kind of kernel regression
> > might not provide a useful exploratory or heuristic
> > approach to smoothing your response as a function of
> > your predictor. In practice, it might work quite well.
> > But I am not clear that the idea of averaging across a
> > binary response is quite the best way to approach your problem. That's
> > a lack of clarity on my part, and open to correction
> > from people with stronger technical grasp of this
> > area.
> > > Another question is regarding graphing the kernel estimates
> > > and bootstrap confidence intervals. I
> > > have seen in some journals where kernel regressions (y on x)
> > > were used and bootstrap CI were
> > > plotted around the kernel estimates. I encountered 3 problems
> > > here. Firstly, I could not save
> > > kernreg graphs like I could with scatter plots. Secondly, I
> > > know how to calculate bootstrap CI but
> > > dont know how to plot them on a graph. Lastly, how do I plot
> > > both together on one graph?
> > Your problems here are not indicated precisely. Perhaps
> > you should start by stating which version of Stata you
> > are using. If you are using Stata 8, -kernreg*- is,
> > as stated, superseded. If you are using -kernreg2-,
> > you should indicate precisely what you did. If you are using
> > -kernreg-, that is against my strong advice, as indicated.
> > Nick Cox
> > n.j.cox@d...
> > > > -kernreg2-, of which I am notionally first
> > > > author, was intended to be a temporary fix
> > > > of -kernreg-, written by other people.
> > > >
> > > > It didn't turn out that way, but no matter:
> > > > -locpoly- is now the recommended command,
> > > > in my view. In short, -kernreg2- is history,
> > > > except that it remains in the archives out
> > > > of inertia and for people still on earlier
> > > > versions of Stata.
> > > >
> > > > However, both of them stop a long way short
> > > > of offering this kind of functionality.
> > > >
> > > > Having said that, my own personal view is
> > > > that kernel regression is not obviously
> > > > the best thing for summarising how a
> > > > binary response varies with a predictor.
> > > > I can't offer more positive advice because
> > > > I am unclear on how far your problem is
> > > > tractable at all.
> > Eik Leong Swee
> > > > > I am trying to do a kernel density estimation of a y ( a
> > > 0-1 variable)
> > > > > on x1. This generates Graph1. I also did an estimation on
> > > y on x2 and
> > > > > generated graph2. I used kernreg2 for both these estimations.
> > > > >
> > > > > Now, I would also like to bootstrap confidence intervals
> > > around the
> > > > > graph and subsequently test the two distributions from
> > > graph 1 and 2
> > > > > (to see if they are statistically different in the
> > > relevant range) .
> > > > > Unfortunately, kernreg2 does not give the non-parametric standard
> > > > > errors. I tried bootstrapping nevertheless, and this is the output
> > > > > that I get.
> > > > > Bootstrap statistics
> > > > >
> > > > > Variable | Reps Observed Bias Std. Err. [95% Conf. Interval]
> > > > > ---------+----------------------------------------------------
> > > > > ---------------
> > > > > klnpce | 100 10.69125 .5342394 .9190264 8.867703 12.5148 (N)
> > > > > | 9.449879 13.2954 (P)
> > > > > | 9.095177 11.76517 (BC)
> > > > > --------------------------------------------------------------
> > > > > ---------------
> > > > > N = normal, P = percentile, BC = bias-corrected
> > > > >
> > > > >
> > > > > First I would like to draw confidence intervals for the entire
> > > > > function, and then bootstrap the confidence intervals and
> > > am not sure
> > > > > how to do it. I was wondering if anyone had faced this
> > > problem, and
> > > > > could help me out.
> > *
> > * For searches and help try:
> > * http://www.stata.com/support/faqs/res/findit.html
> > * http://www.stata.com/support/statalist/faq
> > * http://www.ats.ucla.edu/stat/stata/
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
Do you Yahoo!?
The all-new My Yahoo! - Get yours free!
* For searches and help try: