Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: RE: statalist-digest V4 #4425
st: RE: statalist-digest V4 #4425
Sun, 12 Feb 2012 14:37:02 -0000
Date: Fri, 10 Feb 2012 09:52:02 +0100
From: Jen Zhen <firstname.lastname@example.org>
Subject: st: Computing the Gini or another inequality coefficient from a limited number of data points
Dear list members,
I would like to compute a measure of income inequality similar to the
Gini index. I do not know everyone's income, so need to make an
For the 5 most recent years, I know for 6 income brackets how many
individuals there are and their joint income, hence also the average
income in the bracket. For the full-fledged Gini index I would need to
know the area under the curve which shows the cumulative income
against the cumulative number of tax payers (to visualize what I mean,
look e.g. at the 2nd figure here:
http://en.wikipedia.org/wiki/Gini_index <https://exchange.lse.ac.uk/exchweb/bin/redir.asp?URL=http://en.wikipedia.org/wiki/Gini_index> ).
Now I believe that with the information I have I don't know the entire
curve but I know only 7 points on it (the six points mentioned plus
the origin). So I think I can approximate the said area if I simply
assume that between the 7 points the line is straight, but that will
systematically underestimate the true degree of inequality. So I'm
wondering if there is a sensible way to smooth the curve and hence get
a better approximation?
For the 5 earliest years unfortunately I know only the number of
individuals in each bracket but not their joint income. So my idea was
that I would regress the mean income in each bracket on a 3rd-order
function in the year to see how it develops in the 5 latest years and
use this to predict/estimate the mean income for each bracket in the 5
earlier years, then use the procedure described in (1). A simpler
alternative would be to just use the midpoint of each bracket, but I
guess this would be less good.
Does this procedure sound sensible? Or is there a better way to
compute inequality from these data?
Thank you so much and best regards,
You have received some useful suggestions. However, also note that there is a well-established literature on non-parametric approaches to estimation of inequality indices from data that are in grouped form.
See e.g. FA Cowell and F Mehta "The estimation and interpolation of inequality measures", Review of Economic Studies, 49(2), April 1982, 273-290. And references therein.
They also refer to derivation of upper and lower bounds.
With the information about your income distribution that you have (much grouping, but at least means within categories ... but what in the top unbounded range?), the bounds on your Ginis may be quite wide.
Professor Stephen P. Jenkins
Department of Social Policy and STICERD/CASE
London School of Economics and Political Science
Houghton Street, London WC2A 2AE, UK
Tel: +44(0) 207 955 6527
Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP 2011, http://ukcatalogue.oup.com/product/9780199226436.do <http://ukcatalogue.oup.com/product/9780199226436.do>
Survival Analysis Using Stata: http://www.iser.essex.ac.uk/survival-analysis
Downloadable papers and software: http://ideas.repec.org/e/pje7.html
Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer