Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Troy Payne <paynetc@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Zeros and measures of inequality or concentration |

Date |
Mon, 13 Feb 2012 11:44:04 -0900 |

You're right: data reduction is tougher with such a heavily skewed distribution. Here, the mean crime count is 1.9, the standard deviation is 5.7, the median is 0, and 56% of apartment buildings have zero crimes. But even that's not enough information to describe how skewed the distribution is: over 72% of the sample is below the mean. In other words, most apartments are crime-free or nearly so, while a handful are very high-crime. For purely descriptive purposes, it's usually faster to use a chart, which is how much of the criminological literature describes these concentrations (e.g., Eck, Clarke and Guerette, 2007). The concentration is usually so dramatic that a graphic conveys it much better than any summary measure. I didn't mention this before, because I tried to keep my question to the list quite narrow. My current research question involves comparing the concentration of two different groupings of apartments. I was looking for a more formal way to do so than comparing graphs visually, and the Gini coefficient (and other measures of inequality/concentration) seemed to fit the bill until I ran into the question of what each measure does with values of zero. In general, this issue of heavily skewed distributions is a huge one in criminological research... and one that most criminologists (myself included) haven't quite figured out how to handle. Eck, J.E., Clarke, R.V., and Guerette, R.T. (2007). Risky facilities: Crime concentrations in homogeneous sets of establishments and facilities. Crime Prevention Studies, 21, pp 225-264. Available: http://www.popcenter.org/tools/risky_facilities/PDFs/Eck_etal_press.pdf -- Troy Payne Email: paynetc@gmail.com On Mon, Feb 13, 2012 at 12:08 AM, Nick Cox <njcoxstata@gmail.com> wrote: > > I don't see that this necessarily means measures of inequality. The > usual summary measures, say mean and standard deviation, perhaps > supplemented by the fraction of zeros, should be helpful. > > Nick > > On Thu, Feb 9, 2012 at 5:00 PM, Troy Payne <paynetc@gmail.com> wrote: > > Thanks to Nick Cox and David Hoaglin for the suggestion to use Poisson > > or zero-inflated models. I've used those in the past when modeling > > the effect of independent variables on crime. Here, my purpose is > > more descriptive; I have no predictors to model. > > > > Thanks also to Stephen Jenkins and Roger Newson for suggestions to use > > -ineqdec0- and -scsomersd- packages. I'll do that and read their > > documentation. > > > > > > On Wed, Feb 8, 2012 at 8:17 PM, Troy Payne <paynetc@gmail.com> wrote: > >> I have a more statistical question than a Stata-related question: Which measure of inequality or concentration is best for data with a large number of observations with a value of zero? > >> > >> While I haven't used them before, it seems that Lorenz curves, Gini coefficients, and other related measures of inequality would be a good way to examine concentrations of crime at addresses. Like income, crime tends to be highly concentrated, with a relative handful of places contributing large proportions to the total crime count. In fact, at the place-level (address or street segment) the most common crime count is often zero. > >> > >> I have crime data at apartment buildings in a midwestern city. In my data, 45% of apartments had zero crimes in any given year. If I include only violent crimes, then 74% of apartments have zero crimes in any given year. > >> > >> Posts here on Statalist lead me to -inequal-, -sgini-, -lorenz-, and -glcurve- (all installed in Stata 12.1, all available via SSC). Judging from the r(N) returned, -inequal- seems to explicitly exclude observations with values of zero, while -sgini- does not. It's difficult for me to tell if -lorenz- and -glcurve- include observations with values of zero, even after reading the help files and other documentation provided. > >> > >> Nearly all of what I've read about these various inequality measures so far seems to assume non-zero values, or at least that zero values are rare. I'm unsure what the practical impact of a large proportion of zeros would have, even for user-written commands that appear to allow them. > >> > >> Until two days ago, I had never dug into the details of Gini coefficients. It's possible that the documentation has the answer and I've just missed it. I'd very much appreciate any guidance list members could give. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Zeros and measures of inequality or concentration***From:*David Hoaglin <dchoaglin@gmail.com>

**References**:**st: Zeros and measures of inequality or concentration***From:*Troy Payne <paynetc@gmail.com>

**Re: st: Zeros and measures of inequality or concentration***From:*Troy Payne <paynetc@gmail.com>

**Re: st: Zeros and measures of inequality or concentration***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Recursively build local macro with "" around each addition** - Next by Date:
**Re: st: ANOVAs and Probability Weights** - Previous by thread:
**Re: st: Zeros and measures of inequality or concentration** - Next by thread:
**Re: st: Zeros and measures of inequality or concentration** - Index(es):