Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
<S.Jenkins@lse.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Imputation using ML for a lognormal ordered income variable |

Date |
Tue, 20 Nov 2012 09:58:42 -0000 |

Stas makes good points about the steps required for your analysis, including how to fit a distribution to grouped income data, and then how to simulate (multiply impute) values given your estimates. I would simply add that (1) you don't have to use a lognormal distribution: there are others out there. Austin Nichols has adapted a number of the -*fit- programs on SSC to handle grouped data. E.g. -findit gbgfit-, -findit dagfit-, -findit smgfit-, and references therein. (2) Other people have addressed similar issues. See e.g. Claire Vermaak (2012) Tracking poverty with coarse data: evidence from South Africa. The Journal of Economic Inequality, Volume 10, Issue 2, pp 239-265 R Daniels (2009?), The income distribution with coarsened data, Working Paper 82, University of Cape Town. [Google for it] Stephen P. Jenkins, Richard V. Burkhauser, Shuaizhang Feng, and Jeff Larrimore (2011). 'Measuring inequality using censored data: a multiple imputation approach', Journal of the Royal Statistical Society, Series A, 174 (1), 63-81. More generally, have a look at the sections on handling of grouped income data in Measuring Inequality, a book by Frank Cowell. Stephen ------------------ Stephen P. Jenkins <s.jenkins@lse.ac.uk> Professor of Economic and Social Policy Department of Social Policy London School of Economics and Political Science Houghton Street, London WC2A 2AE, UK Tel: +44(0)20 7955 6527 Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP 2011, http://ukcatalogue.oup.com/product/9780199226436.do Survival Analysis Using Stata: http://www.iser.essex.ac.uk/survival-analysis Downloadable papers and software: http://ideas.repec.org/e/pje7.html ------------------------------ Date: Mon, 19 Nov 2012 09:48:19 -0600 From: Stas Kolenikov <skolenik@gmail.com> Subject: Re: st: Imputation using ML for a lognormal ordered income variable My understanding is that -lognfit- works with the exact data, not the coarsened data that you have. As you obviously see, imputing the median or the mean or any specific number is plain wrong (although I have to admit to having done just that in the -polychoric- module... which was more than 10 years ago when I was stupid enough to even start the whole project :) ). So what I would do is: 1. estimate the parameters of the lognormal model via -intreg-, using logs of income as the cutoffs between categories. It will give you the mean and the variance of logs (or conditional mean if you put demographic covariates into your regression) 2. figure out the conditional distribution (truncated normal for logs within a given bin) 3. simulate from that conditional distribution, create a new variable 4. repeat a bunch of times, creating say 20 or 50 plausible income variables 5. declare this to be an -mi set wide- data set and analyze the data as multiply imputed To check the sensitivity at the right tail, you might want to modify the simulated value in 3 for the upper category to be a Pareto distribution that connects smoothly to the lognormal distribution. I also recall that Stephen Jenkins, the author of lognfit, also worked on other parametric income distribution specifications -- see e.g. http://www.citeulike.org/user/ctacmo/article/4500072. On Mon, Nov 19, 2012 at 9:34 AM, Tinna Asgeirsdottir <statalist.tla@gmail.com> wrote: > Thanks for the helpful reply Stas, > > I don´t think the recommendation referred to interval regression or > multiple imputation. I think it referred to imputing the probable > average or median of each category, but without the obviously false > assumption of a uniform distribution within each category the midpoint > would suggest. > > If I do a ML fit of a lognormal distribution using the lognfit command > I can get the parameters of the distribution. I guess I should be able > to work this out by hand from there, but figured that there might be > an easier way. > > Best > Tinna Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Grouping regressions with Esttab** - Next by Date:
**Re: st: example about choice experiment datasheet** - Previous by thread:
**Re: st: Imputation using ML for a lognormal ordered income variable** - Next by thread:
**st: Another question on -odbc load-** - Index(es):