Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Imputation using ML for a lognormal ordered income variable |

Date |
Mon, 19 Nov 2012 09:48:19 -0600 |

My understanding is that -lognfit- works with the exact data, not the coarsened data that you have. As you obviously see, imputing the median or the mean or any specific number is plain wrong (although I have to admit to having done just that in the -polychoric- module... which was more than 10 years ago when I was stupid enough to even start the whole project :) ). So what I would do is: 1. estimate the parameters of the lognormal model via -intreg-, using logs of income as the cutoffs between categories. It will give you the mean and the variance of logs (or conditional mean if you put demographic covariates into your regression) 2. figure out the conditional distribution (truncated normal for logs within a given bin) 3. simulate from that conditional distribution, create a new variable 4. repeat a bunch of times, creating say 20 or 50 plausible income variables 5. declare this to be an -mi set wide- data set and analyze the data as multiply imputed To check the sensitivity at the right tail, you might want to modify the simulated value in 3 for the upper category to be a Pareto distribution that connects smoothly to the lognormal distribution. I also recall that Stephen Jenkins, the author of lognfit, also worked on other parametric income distribution specifications -- see e.g. http://www.citeulike.org/user/ctacmo/article/4500072. On Mon, Nov 19, 2012 at 9:34 AM, Tinna Asgeirsdottir <statalist.tla@gmail.com> wrote: > Thanks for the helpful reply Stas, > > I don´t think the recommendation referred to interval regression or > multiple imputation. I think it referred to imputing the probable > average or median of each category, but without the obviously false > assumption of a uniform distribution within each category the midpoint > would suggest. > > If I do a ML fit of a lognormal distribution using the lognfit command > I can get the parameters of the distribution. I guess I should be able > to work this out by hand from there, but figured that there might be > an easier way. > > Best > Tinna > > 2012/11/17 Stas Kolenikov <skolenik@gmail.com>: >> Lognormal distribution will likely underestimate how heavy the top >> tail is (although if you are interested in Iceland, you may have a >> very egalitarian income distribution, so the shape of that tail may >> not be that terrible). Lognormal distribution is a very cute model to >> play with and very dangerous in real work. In my work on Russian data, >> changing the assumptions about the top tail moved our Gini index from >> 0.48 to 0.60... and that's a little bit of a difference, let's put it >> this way. >> >> The recommendation you have heard probably concerns -intreg-, which >> you can read the help on. >> >> Imputing the mean income over a group will lead to a multitude of >> problems due to artificially compressed variability and values that >> are simply too low for the top group. If you desperately need to >> impute, you would want to go with multiple imputations (-help mi-), >> although you would want to read the MI manual and a paper >> (http://www.citeulike.org/user/ctacmo/article/8525275) or two >> (http://www.jstor.org/stable/2291635) if you are not familiar with the >> technique. What I have done in one of my projects recently was to >> generate the plausible values of the variable of interest a bunch of >> times (say, 50... the original suggestion to use 5 imputations dates >> back to late 1970s... and your smartphone now has more computing power >> than a then-Cray supercomputer) and make Stata believe they were >> imputed in Stata mi wide format. >> >> -- >> -- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name >> -- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at >> srbi dot com >> -- Opinions stated in this email are mine only, and do not reflect the >> position of my employer >> >> >> On Sat, Nov 17, 2012 at 6:12 AM, Tinna Asgeirsdottir >> <statalist.tla@gmail.com> wrote: >>> Dear Stata users, >>> >>> In my data I have income in 13 groups. The top group is open ended. I >>> am trying to impute sensible values and would like to use this as a >>> continuous variable. I am especially concerned about the top category. >>> It has been suggested to me that I should use STATA´s ML command in >>> stead of using each categories mid-point. I am having trouble finding >>> what I need on the internet. Thus I wonder if anyone can tell me how >>> to fit a lognormal distribution to the variable and subsequently infer >>> the average income in the top bracket. If you know how to do this in >>> general for all the categories that is great as well as the >>> distributions over the other brackets is surely not uniform. However, >>> I think finding a good solution for my top category is the most >>> important thing though. >>> >>> Best regards, >>> Tinna >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- -- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name -- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at srbi dot com -- Opinions stated in this email are mine only, and do not reflect the position of my employer * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Imputation using ML for a lognormal ordered income variable***From:*Austin Nichols <austinnichols@gmail.com>

**Re: st: Imputation using ML for a lognormal ordered income variable***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Imputation using ML for a lognormal ordered income variable***From:*Tinna Asgeirsdottir <statalist.tla@gmail.com>

**Re: st: Imputation using ML for a lognormal ordered income variable***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: Imputation using ML for a lognormal ordered income variable***From:*Tinna Asgeirsdottir <statalist.tla@gmail.com>

- Prev by Date:
**Re: st: Grouping regressions with Esttab** - Next by Date:
**Re: st: example about choice experiment datasheet** - Previous by thread:
**Re: st: Imputation using ML for a lognormal ordered income variable** - Next by thread:
**Re: st: Imputation using ML for a lognormal ordered income variable** - Index(es):