Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Imputation using ML for a lognormal ordered income variable


From   <S.Jenkins@lse.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Imputation using ML for a lognormal ordered income variable
Date   Tue, 20 Nov 2012 09:58:42 -0000

Stas makes good points about the steps required for your analysis, including how to fit a distribution to grouped income data, and then how to simulate (multiply impute) values given your estimates.

I would simply add that 

(1) you don't have to use a lognormal distribution: there are others out there. Austin Nichols has adapted a number of the -*fit- programs on SSC to handle grouped data. E.g. -findit gbgfit-, -findit dagfit-, -findit smgfit-, and references therein.

(2) Other people have addressed similar issues. See e.g. 

Claire Vermaak (2012) Tracking poverty with coarse data: evidence from South Africa. The Journal of Economic Inequality, Volume 10, Issue 2, pp 239-265

R Daniels (2009?), The income distribution with coarsened data, Working Paper 82, University of Cape Town. [Google for it]

Stephen P. Jenkins, Richard V. Burkhauser, Shuaizhang Feng, and Jeff Larrimore (2011). 'Measuring inequality using censored data: a multiple imputation approach', Journal of the Royal Statistical Society, Series A, 174 (1), 63-81.

More generally, have a look at the sections on handling of grouped income data in Measuring Inequality, a book by Frank Cowell. 


Stephen
------------------
Stephen P. Jenkins <s.jenkins@lse.ac.uk>
Professor of Economic and Social Policy
Department of Social Policy 
London School of Economics and Political Science
Houghton Street, London WC2A 2AE, UK
Tel: +44(0)20 7955 6527
Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP 2011, http://ukcatalogue.oup.com/product/9780199226436.do
Survival Analysis Using Stata: http://www.iser.essex.ac.uk/survival-analysis
Downloadable papers and software: http://ideas.repec.org/e/pje7.html


------------------------------

Date: Mon, 19 Nov 2012 09:48:19 -0600
From: Stas Kolenikov <skolenik@gmail.com>
Subject: Re: st: Imputation using ML for a lognormal ordered income variable

My understanding is that -lognfit- works with the exact data, not the
coarsened data that you have. As you obviously see, imputing the
median or the mean or any specific number is plain wrong (although I
have to admit to having done just that in the -polychoric- module...
which was more than 10 years ago when I was stupid enough to even
start the whole project :) ). So what I would do is:
1. estimate the parameters of the lognormal model via -intreg-, using
logs of income as the cutoffs between categories. It will give you the
mean and the variance of logs (or conditional mean if you put
demographic covariates into your regression)
2. figure out the conditional distribution (truncated normal for logs
within a given bin)
3. simulate from that conditional distribution, create a new variable
4. repeat a bunch of times, creating say 20 or 50 plausible income variables
5. declare this to be an -mi set wide- data set and analyze the data
as multiply imputed

To check the sensitivity at the right tail, you might want to modify
the simulated value in 3 for the upper category to be a Pareto
distribution that connects smoothly to the lognormal distribution. I
also recall that Stephen Jenkins, the author of lognfit, also worked
on other parametric income distribution specifications -- see e.g.
http://www.citeulike.org/user/ctacmo/article/4500072.

On Mon, Nov 19, 2012 at 9:34 AM, Tinna Asgeirsdottir
<statalist.tla@gmail.com> wrote:
> Thanks for the helpful reply Stas,
>
> I don´t think the recommendation referred to interval regression or
> multiple imputation. I think it referred to imputing the probable
> average or median of each category, but without the obviously false
> assumption of a uniform distribution within each category the midpoint
> would suggest.
>
> If I do a ML fit of a lognormal distribution using the lognfit command
> I can get the parameters of the distribution. I guess I should be able
> to work this out by hand from there, but figured that there might be
> an easier way.
>
> Best
> Tinna



Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index