Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: ladder question for right-skewed variable


From   David Hoaglin <dchoaglin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: ladder question for right-skewed variable
Date   Fri, 26 Apr 2013 17:35:38 -0400

Gabriel,

I second Nick's advice to abandon -ladder-.  Choosing a transformation
involves a fair amount of judgment, and I would not delegate the
choice to an automated process.  I also have some other comments.

The number of people who reported being displaced by violence is a
count.  Sometimes the square root is a reasonable transformation for
counts, but large counts often need a logarithm.

As Nick suggested, however, a Poisson model may be appropriate or
perhaps a negative binomial model.  Before I tried such models,
however, I would want to know why your data did not include any zeros.
 Is 1010 the total number of municipalities in Colombia, or do your
data include only municipalities in which at least 1 person reported
being displaced?  Either a Poisson distribution or a negative binomial
distribution would have positive probability of producing some zeros.
If zeros have been excluded, the model would have to handle that
feature.

Another consideration, perhaps important, is that the usual Poisson
and negative binomial models assume that the occurrences are
independent.  The nature of your data suggests that some types of
clustering are likely to be involved.  An episode of violence is
likely to cause a number of people to be displaced simultaneously, and
it might affect nearby municipalities similarly.

Yet another feature of the data is the size of the municipality.  The
number of people displaced might be related to the population of the
municipality.  Do you have data on the populations?

You said that the data do not show bimodal structure, but I could
easily imagine that they represent a mixture of distributions, maybe
having several components.  Do you have other variables that might
help to account for structure in the data (geographic and otherwise)?

I am probably making your analysis more complicated, but I hope I am
making it more realistic.

David Hoaglin


On Fri, Apr 26, 2013 at 4:57 PM, Gabriel Nelson
<lgabrielnelson@gmail.com> wrote:
> Thanks very much for your suggestions Nick. It makes sense that the
> problem might lie within -sktest-. I won't worry any more about this
> problem and just proceed with the qnorm command, as you suggested.
> Thanks again.
>
> Gabriel
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index