[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: -locpoly- |

Date |
Wed, 02 Jun 2004 10:12:10 -0500 |

D.Christodoulou <absc11@bangor.ac.uk> asks: > I have purchased the Stata Journal 2003, Vol.3, No.4, and I have just > finished reading the description of -locpoly-. I still have some questions: > (1)Since -locpoly- implements the Nadaraya(1964) and Watson(1964) estimator > estimator (the same as the -kernreg- command), shall I assume that a local > mean (constant) polynomial smooth (of order p=0) will be the same as > -kernreg-? (assumning the same bandwidth, kernel and gridpoints) Yes you can assume they are equal as long as the kernels are precisely equal across both commands. Sometimes kernel function definitions can differ across implementations of kernel smoothing due to minor matters of taste. For example, some researchers like to truncate the Gaussian kernel to the interval [-1,1] whereas -locpoly- treats the domain of the Gaussian kernel as the entire real line. For a list of the kernels used by -locpoly-, see [R] kdensity, and in particular pg. 228 of [R] G-M. > (2)If -width- is not specified in the -locpoly- options, then the default is > used which is described as 'optimal' by the authors of kdensity and kernreg. > However, the authors of -locpoly- described this default bandwidth as > "entirely inappropriate for local polynomial smoothing". It further advices > that I should start with the default and adjust it according to my needs. > However, I do not wish to choose a bandwidth that relies only on visual > criteria, since the badwidth essentialy defines the cosistency of the > estimation. How can I define (statistically) such an 'appropriate' bandwidth > for polynomial smoothing? Any advices? Any specific references? (the libray > of my university does not have Fan and Gijbels (1996) and is about 48.00GBP > to purchase the textbook) There is a vast (and I mean VAST!) literature on the subject of bandwidth selection for kernel smoothing. Try doing a Google search on "bandwidth selection kernel smoothing" and you'll see what I mean. As to which of these methods have been implemented in Stata, I am not aware of any except for my own private implementation of the "direct rule-of-thumb" local-linear method of Ruppert, Sheather and Wand (1995 JASA). This method is specific to local-linear smoothing, but is sufficient for my use since I have an affinity for local lines. I have not made my code publically available but can do so if you wish, or if there is general interest. > (3)I understand that there is a great negative relation between the > specified number of points (grids) to be estimated and the computing > intensity. I have many datasets that vary from 100 to 30,000 observations. > Since is will take a lot of time to experiment with many combinations of > -n(#)- is there a 'rule of thumb' that defines the number of points to be > specfied as a function of the number of observations. Not really a rule of thumb here, since the number of points at which you estimate has no bearing on the properties of the estimator at each particular point. It all comes down to "how many points do you want an estimated y-hat for?" Remember that you are usually graphing this curve and so you want enought points so that the graph isn't too sparse. > (4)Since I am mostly interested in revealing patterns in the middle of the > distribution (and not so much in the tails), is there a way to specify more > grid points in the middle? How can I generate such a variable so that I > included it in the option -at(var)-? What am I missing here? The -range- command in Stata will create a variable with a specified number of equally spaced grid points over a given range. You could then combine -range- with -append- to create a variable that suits your needs. Alternately, you could generate a variable using -range- that has number of points you need in the middle, and then use -drop- to drop every other observation (say) in the tails. --Bobby rgutierrez@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: Re: wishlist - active dataset name in title bar** - Next by Date:
**Re: st: Competing Cause Mortality** - Previous by thread:
**RE: st: RE: -locpoly-** - Next by thread:
**Re: Re: st: proceeding beyond roctab 2000 error msg - correction** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |