Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Computing the Gini or another inequality coefficient from a limited number of data points

 From Joerg Luedicke To statalist@hsphsun2.harvard.edu Subject Re: st: Computing the Gini or another inequality coefficient from a limited number of data points Date Fri, 10 Feb 2012 09:32:19 -0500

```Daniel, do you mean a power law distribution? (Just had a quick glance
at your paper and saw that you were talking about Pareto distributions
there. The Poisson distribution has only one parameter.)

Joerg

On Fri, Feb 10, 2012 at 6:48 AM, Daniel Feenberg <feenberg@nber.org> wrote:
>
> On Fri, 10 Feb 2012, Jen Zhen wrote:
>
>> Dear list members,
>>
>> I would like to compute a measure of income inequality similar to the
>> Gini index. I do not know everyone's income, so need to make an
>> approximation.
>>
>> (1)
>> For the 5 most recent years, I know for 6 income brackets how many
>> individuals there are and their joint income, hence also the average
>> income in the bracket. For the full-fledged Gini index I would need to
>> know the area under the curve which shows the cumulative income
>> against the cumulative number of tax payers (to visualize what I mean,
>> look e.g. at the 2nd figure here:
>> http://en.wikipedia.org/wiki/Gini_index).
>> Now I believe that with the information I have I don't know the entire
>> curve but I know only 7 points on it (the six points mentioned plus
>> the origin). So I think I can approximate the said area if I simply
>> assume that between the 7 points the line is straight, but that will
>> systematically underestimate the true degree of inequality. So I'm
>> wondering if there is a sensible way to smooth the curve and hence get
>> a better approximation?
>>
>> (2)
>> For the 5 earliest years unfortunately I know only the number of
>> individuals in each bracket but not their joint income. So my idea was
>> that I would regress the mean income in each bracket on a 3rd-order
>> function in the year to see how it develops in the 5 latest years and
>> use this to predict/estimate the mean income for each bracket in the 5
>> earlier years, then use the procedure described in (1). A simpler
>> alternative would be to just use the midpoint of each bracket, but I
>> guess this would be less good.
>>
>> Does this procedure sound sensible? Or is there a better way to
>> compute inequality from these data?
>>
>
> You could assume an income distribution function, such as log-normal,
> poisson, or even Gini and solve for the parameters using the available data.
> We do this in "Income Inequality and the Incomes of Very High-Income
> Taxpayers: Evidence from Tax Returns"
>
>  http://www.nber.org/chapters/c10880
>
> with a Poisson. The Poisson is a two parameter distribution, so we solve for
> the parameters within an income bracket using only the 2 breakpoints that
> define the bracket. That way there is no need to extrapolate beyond the
> observed data, or coerce data everywhere in the income distribution to a
> small number of parameters estimated over the whole distribution.
>
> Daniel Feenberg
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```