Joshua A. Guetzkow" <joshg@Princeton.EDU>

statalist@hsphsun2.harvard.edu

Re: st: Imputing Mean of Top-Coded Income Category

Fri, 26 Jul 2002 17:29:42 -0400 (EDT)

David, Actually, I did get one response privately from Dick Campbell, who suggested using CNREG (see below). I thank you both for your suggestions and vigilance. Date: Mon, 15 Jul 2002 10:05:24 -0500 From: Dick Campbell <dcamp@uic.edu> It occurs to me that you could still use public data to estimate means by subgroups using cnreg and that those estimates would be better than Pareto estimates which do indeed depend on somewhat abritrary categorizations. I suspect there are folks out there who can give you better advice, so if you get something useful, please let me know. Thanks. On Fri, 26 Jul 2002, David Jacobs wrote: > I waited to see if you got a response to your query to the list, but I now > have some time to respond. > > About ten years ago I tried a simple program in a spread sheet that used > the Pareto formula to compute top category means for yearly income > distributions from the CPS from 1947 until about 1990. The upshot was that > the Pareto estimator didn't seem to work well. I recall that I got > plausible means around $13,000 for a top category (incomes above $10,000 > untill about 1955) for several years after 1950 then all of a sudden the > estimator gave me a mean of $17,000 for no apparent reason. After that > year, the estimates again reverted to plausible numbers. In the end Ron > Helms and I got top category data from the IRS income tax tables and used > them to compute top category means. There was a systematic bias but it > probably was fairly similar for all years. But it was a lot of trouble to > collect all that IRS data however. > > Do you still want to meet at the ASA? Let me know if you do. > > Dave Jacobs > > At 06:20 PM 7/14/02 -0400, you wrote: > >Hi, > > > >I have top-coded, continuous CPS data on earnings. I want to impute the > >mean income of this group of top-coded earners, making the assumption that > >the upper-tail follows a pareto distribution. I'm wondering if anyone has > >suggestions about how to do this in STATA (or even just generally how to > >do it). > > > >Some notes: > > > >1. > >The standard method of doing this typically involves imputing the mean of > >top-coded earners given categories of earnings, using the following > >formula: > > > >Mean Income for top-coded category = X(V/V-1) > >where: > >X = topcode/open-ended category > >V = c-d/b-a > >where > >a = log of lower limit of interval preceding top-coded/open-ended category > >b = log of lower limit of top-coded/open-ended category > >c = log of the sum of the frequencies in the top-coded category and the > >category preceding it > >d = log of the frequencies in the top-coded category > > > >The problem with using this method given continuous earnings data (like > >the CPS) is that the result is highly dependent on the choice one makes > >about what interval to define as the "preceding category." > > > > > >2. > >Another method would use the mode and median to solve the equation: > > > >median = mode * 2 (to the 1/V power) > > > >(using the observed median and mode of the sample to calculate V and solve > >the equation above) > > > >The problem here is that when the median is less than the mode, it gives a > >value for V less and 1, such that multiplying the top code gives a mean > >for the top-coded income that is LESS than the top code, much to my > >consternation. > > > > > >Any help on this would be much appreciated! > > > >Josh Guetzkow > > > >Princeton University > >Dept. of Sociology > >Wallace Hall > >Princeton, NJ 08544 > > > > > > > > > >* > >* For searches and help try: > >* http://www.stata.com/support/faqs/res/findit.html > >* http://www.stata.com/support/statalist/faq > >* http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

