In addition to other comments -- and closest in spirit to
Marcello Pagano's comment -- I implemented a log transform,
kernel density and back transform option in -mdensity-,
which is for Stata 6 and has been on SSC since 1999.
There is no public version of -mdensity- for Stata 8
up. Stata's rewriting of -kdensity- in Stata 8 made it
difficult if not impossible for me to keep the same
structure for -mdensity-, which was just a wrapper for
-kdensity-.
No matter, the logic is quite easy and yields to a few
lines, for which -mdensity- is not needed at all. There
was a write-up in
Graphing distributions. SJ 4(1):66--88 (2004)
What follows is based on an extract. For the references,
please buy or borrow a copy of the Stata Journal.
----------------------
Some simple devices extend the range of applications of Stata's official
commands for kernel density estimation. First is the idea of
estimating the density function on a transformed scale and then
back-transforming the estimate to one for the raw scale. Two of the most
natural transformations here, as elsewhere, are logarithms for positive
variables and logit-like transformations for proportions and other data
measured on some interval (a,b). The underlying general principle is
that for a continuous monotone transformation t(x), the densities
f(x) and f(t(x)) are related by f(x) = f(t(x)) |dt/dx|. This
procedure is mentioned briefly by Silverman (1986, pp.27-30), although
his worked example (p.28) is not very encouraging. Good expositions are
given by Wand and Jones (1995, pp.43-45), Simonoff (1996, pp.61-64) and
Bowman and Azzalini (1997, pp.14-16).
With a logarithmic transformation of x we have
estimate of f(x) = estimate of f(log x) times (1 / x),
given that d/dx (log x) = 1/x. Note in particular, if data are
right skewed, that the result of this transformation is more smoothing
in the tail and less near the main part of the distribution than in the
default method. I have found this one of the most valuable ways of going
beyond the default. It fits very well both the common finding that
positive variables are right-skewed, suggesting a transformation such as
the logarithm, and the common attitude that results on the original
scale are of direct scientific or practical interest. To put it another
way, the transformation behaves more like a link function than a
classical transformation, given that end results are on the scale of the
original response. You can get the best of both worlds.
Returning to the wage data, here is an illustrative (and certainly
not definitive) example, in which we just use default kernel and width
choice.
. gen logwage = log(wage)
. kdensity logwage, at(logwage) gen(densitylog)
. gen density = densitylog/wage
. levels wage, local(levels)
. line density wage, sort xtick(`levels', tposition(inside))
The density function ... is much smoother in the tails
than the equivalent default .... However, the step in the
left-hand tail needs investigation: is this some odd artefact or a
genuine feature of the data?
<original continues>
For gammas, some people use cube roots (cf. Wilson-Hilferty
transformation).
Nick
[email protected]
Daniel Schneider
> I am looking for a way to use asymmetric kernels in kernel density
> estimations, for example a gamma kernel (Chen 2000). Unfortunately, I
> have not been able to locate any implementation in Stata.
>
> Am I missing something or has anyone implemented an asymmetric kernel
> estimator in Stata (my data is more or less from a gamma
> distributions,
> non-negative by definition)?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/