Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Computing the Gini or another inequality coefficient from a limited number of data points

 From To Subject st: Computing the Gini or another inequality coefficient from a limited number of data points Date Tue, 14 Feb 2012 14:26:26 -0000

```[Reposted because, unbeknown to me, my institution's webmailer sent non-ascii text plus a nasty winmail.dat file. Subject: line more informative too.]

------------------------------

Date: Fri, 10 Feb 2012 09:52:02 +0100
From: Jen Zhen <jenzhen99@gmail.com>
Subject: st: Computing the Gini or another inequality coefficient from a limited number of data points

Dear list members,

I would like to compute a measure of income inequality similar to the
Gini index. I do not know everyone's income, so need to make an
approximation.

(1)
For the 5 most recent years, I know for 6 income brackets how many
individuals there are and their joint income, hence also the average
income in the bracket. For the full-fledged Gini index I would need to
know the area under the curve which shows the cumulative income
against the cumulative number of tax payers (to visualize what I mean,
look e.g. at the 2nd figure here:
http://en.wikipedia.org/wiki/Gini_index).
Now I believe that with the information I have I don't know the entire
curve but I know only 7 points on it (the six points mentioned plus
the origin). So I think I can approximate the said area if I simply
assume that between the 7 points the line is straight, but that will
systematically underestimate the true degree of inequality. So I'm
wondering if there is a sensible way to smooth the curve and hence get
a better approximation?

(2)
For the 5 earliest years unfortunately I know only the number of
individuals in each bracket but not their joint income. So my idea was
that I would regress the mean income in each bracket on a 3rd-order
function in the year to see how it develops in the 5 latest years and
use this to predict/estimate the mean income for each bracket in the 5
earlier years, then use the procedure described in (1). A simpler
alternative would be to just use the midpoint of each bracket, but I
guess this would be less good.

Does this procedure sound sensible? Or is there a better way to
compute inequality from these data?

Thank you so much and best regards,
JZ
-------------------------------

You have received some useful suggestions. However, also note that there is a well-established literature on non-parametric approaches to estimation of inequality indices from data that are in grouped form.

See e.g. FA Cowell and F Mehta "The estimation and interpolation of inequality measures", Review of Economic Studies, 49(2), April 1982, 273-290.  And references therein.

They also refer to derivation of upper and lower bounds.

With the information about your income distribution that you have (much grouping, but at least means within categories ... but what in the top unbounded range?), the bounds on your Ginis may be quite wide.

Stephen
-----------------------------------
Professor Stephen P. Jenkins
Department of Social Policy and STICERD/CASE
London School of Economics and Political Science
Houghton Street, London WC2A 2AE, UK
Tel: +44(0) 207 955 6527
Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP 2011, http://ukcatalogue.oup.com/product/9780199226436.do
Survival Analysis Using Stata: http://www.iser.essex.ac.uk/survival-analysis

Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```