[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
David Jacobs <jacobs.184@osu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Imputing Mean of Top-Coded Income Category |

Date |
Fri, 26 Jul 2002 15:05:15 -0400 |

I waited to see if you got a response to your query to the list, but I now have some time to respond.

About ten years ago I tried a simple program in a spread sheet that used the Pareto formula to compute top category means for yearly income distributions from the CPS from 1947 until about 1990. The upshot was that the Pareto estimator didn't seem to work well. I recall that I got plausible means around $13,000 for a top category (incomes above $10,000 untill about 1955) for several years after 1950 then all of a sudden the estimator gave me a mean of $17,000 for no apparent reason. After that year, the estimates again reverted to plausible numbers. In the end Ron Helms and I got top category data from the IRS income tax tables and used them to compute top category means. There was a systematic bias but it probably was fairly similar for all years. But it was a lot of trouble to collect all that IRS data however.

Do you still want to meet at the ASA? Let me know if you do.

Dave Jacobs

At 06:20 PM 7/14/02 -0400, you wrote:

Hi, I have top-coded, continuous CPS data on earnings. I want to impute the mean income of this group of top-coded earners, making the assumption that the upper-tail follows a pareto distribution. I'm wondering if anyone has suggestions about how to do this in STATA (or even just generally how to do it). Some notes: 1. The standard method of doing this typically involves imputing the mean of top-coded earners given categories of earnings, using the following formula: Mean Income for top-coded category = X(V/V-1) where: X = topcode/open-ended category V = c-d/b-a where a = log of lower limit of interval preceding top-coded/open-ended category b = log of lower limit of top-coded/open-ended category c = log of the sum of the frequencies in the top-coded category and the category preceding it d = log of the frequencies in the top-coded category The problem with using this method given continuous earnings data (like the CPS) is that the result is highly dependent on the choice one makes about what interval to define as the "preceding category." 2. Another method would use the mode and median to solve the equation: median = mode * 2 (to the 1/V power) (using the observed median and mode of the sample to calculate V and solve the equation above) The problem here is that when the median is less than the mode, it gives a value for V less and 1, such that multiplying the top code gives a mean for the top-coded income that is LESS than the top code, much to my consternation. Any help on this would be much appreciated! Josh Guetzkow Princeton University Dept. of Sociology Wallace Hall Princeton, NJ 08544 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Imputing Mean of Top-Coded Income Category***From:*"Joshua A. Guetzkow" <joshg@Princeton.EDU>

**References**:**st: Imputing Mean of Top-Coded Income Category***From:*"Joshua A. Guetzkow" <joshg@Princeton.EDU>

- Prev by Date:
**st: slog (for nested log files)** - Next by Date:
**st: Fw: elements of score for heckprob** - Previous by thread:
**st: Imputing Mean of Top-Coded Income Category** - Next by thread:
**Re: st: Imputing Mean of Top-Coded Income Category** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |