[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: RE: -locpoly-
D.Christodoulou <email@example.com> asks:
> I have purchased the Stata Journal 2003, Vol.3, No.4, and I have just
> finished reading the description of -locpoly-. I still have some questions:
> (1)Since -locpoly- implements the Nadaraya(1964) and Watson(1964) estimator
> estimator (the same as the -kernreg- command), shall I assume that a local
> mean (constant) polynomial smooth (of order p=0) will be the same as
> -kernreg-? (assumning the same bandwidth, kernel and gridpoints)
Yes you can assume they are equal as long as the kernels are precisely equal
across both commands. Sometimes kernel function definitions can differ across
implementations of kernel smoothing due to minor matters of taste. For
example, some researchers like to truncate the Gaussian kernel to the interval
[-1,1] whereas -locpoly- treats the domain of the Gaussian kernel as the
entire real line. For a list of the kernels used by -locpoly-, see [R]
kdensity, and in particular pg. 228 of [R] G-M.
> (2)If -width- is not specified in the -locpoly- options, then the default is
> used which is described as 'optimal' by the authors of kdensity and kernreg.
> However, the authors of -locpoly- described this default bandwidth as
> "entirely inappropriate for local polynomial smoothing". It further advices
> that I should start with the default and adjust it according to my needs.
> However, I do not wish to choose a bandwidth that relies only on visual
> criteria, since the badwidth essentialy defines the cosistency of the
> estimation. How can I define (statistically) such an 'appropriate' bandwidth
> for polynomial smoothing? Any advices? Any specific references? (the libray
> of my university does not have Fan and Gijbels (1996) and is about 48.00GBP
> to purchase the textbook)
There is a vast (and I mean VAST!) literature on the subject of bandwidth
selection for kernel smoothing. Try doing a Google search on "bandwidth
selection kernel smoothing" and you'll see what I mean.
As to which of these methods have been implemented in Stata, I am not aware of
any except for my own private implementation of the "direct rule-of-thumb"
local-linear method of Ruppert, Sheather and Wand (1995 JASA). This method is
specific to local-linear smoothing, but is sufficient for my use since I have
an affinity for local lines.
I have not made my code publically available but can do so if you wish, or
if there is general interest.
> (3)I understand that there is a great negative relation between the
> specified number of points (grids) to be estimated and the computing
> intensity. I have many datasets that vary from 100 to 30,000 observations.
> Since is will take a lot of time to experiment with many combinations of
> -n(#)- is there a 'rule of thumb' that defines the number of points to be
> specfied as a function of the number of observations.
Not really a rule of thumb here, since the number of points at which you
estimate has no bearing on the properties of the estimator at each particular
point. It all comes down to "how many points do you want an estimated y-hat
for?" Remember that you are usually graphing this curve and so you want
enought points so that the graph isn't too sparse.
> (4)Since I am mostly interested in revealing patterns in the middle of the
> distribution (and not so much in the tails), is there a way to specify more
> grid points in the middle? How can I generate such a variable so that I
> included it in the option -at(var)-? What am I missing here?
The -range- command in Stata will create a variable with a specified number of
equally spaced grid points over a given range. You could then combine -range-
with -append- to create a variable that suits your needs. Alternately, you
could generate a variable using -range- that has number of points you need in
the middle, and then use -drop- to drop every other observation (say) in the
* For searches and help try: