Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Imputation using ML for a lognormal ordered income variable


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Imputation using ML for a lognormal ordered income variable
Date   Sat, 17 Nov 2012 07:02:33 -0600

Lognormal distribution will likely underestimate how heavy the top
tail is (although if you are interested in Iceland, you may have a
very egalitarian income distribution, so the shape of that tail may
not be that terrible). Lognormal distribution is a very cute model to
play with and very dangerous in real work. In my work on Russian data,
changing the assumptions about the top tail moved our Gini index from
0.48 to 0.60... and that's a little bit of a difference, let's put it
this way.

The recommendation you have heard probably concerns -intreg-, which
you can read the help on.

Imputing the mean income over a group will lead to a multitude of
problems due to artificially compressed variability and values that
are simply too low for the top group. If you desperately need to
impute, you would want to go with multiple imputations (-help mi-),
although you would want to read the MI manual and a paper
(http://www.citeulike.org/user/ctacmo/article/8525275) or two
(http://www.jstor.org/stable/2291635) if you are not familiar with the
technique. What I have done in one of my projects recently was to
generate the plausible values of the variable of interest a bunch of
times (say, 50... the original suggestion to use 5 imputations dates
back to late 1970s... and your smartphone now has more computing power
than a then-Cray supercomputer) and make Stata believe they were
imputed in Stata mi wide format.

-- 
-- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
-- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
srbi dot com
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer


On Sat, Nov 17, 2012 at 6:12 AM, Tinna Asgeirsdottir
<statalist.tla@gmail.com> wrote:
> Dear Stata users,
>
> In my data I have income in 13 groups. The top group is open ended. I
> am trying to impute sensible values and would like to use this as a
> continuous variable. I am especially concerned about the top category.
>  It has been suggested to me that I should use STATA´s ML command in
> stead of using each categories mid-point. I am having trouble finding
> what I need on the internet. Thus I wonder if anyone can tell me how
> to fit a lognormal distribution to the variable and subsequently infer
> the average income in the top bracket. If you know how to do this in
> general for all the categories that is great as well as the
> distributions over the other brackets is surely not uniform. However,
> I think finding a good solution for my top category is the most
> important thing though.
>
> Best regards,
> Tinna
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index