[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
RE: st: Pareto v. lognormal
In addition, note other pertinent Stata-based literature and
existing programs. The paper
SJ-5-3 gr0018 . . . . . . . . . . Speaking Stata: The protean quantile plot
Q3/05 SJ 5(3):442--460
discusses quantile and distribution plots as used in
the analysis of species abundance data in ecology
has various general comments on power laws and alternatives. It
also shows that -qplot- -- itself last updated in
SJ-6-4 gr42_4 . . . . . . . . . . . . . . . . . . Software update for qplot
Q4/06 SJ 6(4):597
better handling of x-axis titling, a new option allowing
the user to specify an alternative plotting position or
-- is general enough to provide graphical tests of both lognormal
and power law distributions, and indeed other alternatives.
The paper also has more jokes in it than any other Speaking Stata
column, or so it was devised.
A look on SSC reveals -qlognorm-, but only for Stata 7. Why I have not
updated it? I am sheltering a -qlogn- for Stata 8 in my bosom, to
be released when a larger project of which it is a small part and of
which I am a junior author is written up by the senior author. But
that's as may be. Taking logs and firing up -qnorm- is an easy way
to do it in any case.
For my part, I think power laws have been oversold, for a variety of
reasons. Here's one. Often the data that are shown are actually for
the right-hand tail, and not a complete distribution. In many examples,
only the largest cities, the longest rivers, and so forth are readily
available and the other tail may be omitted. Income data are often
better, or so it seems.
Many other distributions could be played with here. One quite interesting
one is the inverse gamma (a.k.a. inverted or reciprocal gamma, Vinci
or Pearson Type V). Its right tail is essentially power law, but it has
a left tail too. I have a bundle of Stata programs for fitting and graphics.
If anyone is interested, I will accelerate public release.
> Stas, Patrick, et al.--
> The rationale for using ln(f(x)) instead of ln(1-F) is that I can
> write down ln(f(x)) for both the Pareto and lognormal families, and I
> can't write down F for the lognormal. For my own purposes, I am
> interested in tests of Pareto v. lognormal for income distributions,
> for which I think my proposed method works. Patrick Wöhrle Guimarães
> wanted an estimate of the parameters of a Pareto distribution, for
> which Stas' method might be preferred:
> cap ssc install kdens
> use http://www2.bc.edu/~gottscha/mobility.dta, clear
> g lnc2=ln(c2)
> kdens lnc2 [pw=wt], g(fx lnx) norm n(`=_N')
> g lnfx=ln(fx)
> g ln2x=lnx^2
> reg lnfx lnx ln2x, r
> di "significant coef on ln2x rejects Pareto"
> sort lnx
> g F=sum(fx)
> keep if F<.
> replace F=F/F[_N]
> g ln1_F=ln(1-F)
> reg ln1_F lnx
> di "est Pareto param a is " -_b[lnx]
> di "est Pareto param k is " exp(-_b[_cons]/_b[lnx])
> nlcom exp(-_b[_cons]/_b[lnx])
> > On 3/6/07, Stas Kolenikov <firstname.lastname@example.org> wrote:
> > > On 3/6/07, Austin Nichols <email@example.com> wrote:
> > > > The Pareto distribution is typically defined by the cdf
> F(x;a) = 1 -
> > > > x^(-a) where a>0 for x>=0 and zero elsewhere, and the
> pdf f(x;a) =
> > > > ax^(-a-1) for x>=0 and zero elsewhere. A version with
> two parameters
> > > > is given by F(x;a,k) = 1-(x/k)^(-a) and f(x; a,k) =
> > > > = a(k)^(a)(x)^(-a-1).
> > >
> > > well I guess it would be easier to take ln(1-F) = -a ln x + a ln k
> > > which is directly estimable by the standard linear regression...
> > > possibly with heteroskedastic standard errors if one
> wished :)). Note
> > > that the regularity conditions are not satisfied for k, so its
> > > estimate is likely to be quirky. Add ln^2 x if you wish to that
> > > regression to test for Pareto-ness of the distribution.
* For searches and help try: