Maarten buis <maartenbuis@yahoo.co.uk>

statalist@hsphsun2.harvard.edu

RE: st: Measure of Variability in a Nominal Variable

Tue, 4 Mar 2008 20:53:39 +0000 (GMT)

A alternative that would fit the desciption given by Kevin is: Agresti (1996) An Introduction to Categorical Data Analysis. Hoboken NJ: John Wiley. Also the reference given by Nick is the second edition, which is much expanded from the the first edition. -- Maarten --- Nick Cox <n.j.cox@durham.ac.uk> wrote: > A reference, as requested by Steven Samuels in his question to Kevin > Daley, is > > Agresti, A. 2002. Categorical data analysis. Hoboken NJ: John Wiley. > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: 04 March 2008 18:04 > To: statalist@hsphsun2.harvard.edu > Subject: RE: st: Measure of Variability in a Nominal Variable > > If p_i is proportion in category i, > then SUM p_i^2 is the probability of being in the same category. > (The sum is over categories, not observations.) > > The complement 1 - SUM p_i^2 is > then the probability of being in different categories. > > The reciprocal 1 / SUM p_i^2 has a nice interpretation as the > equivalent > number > of equally probable categories. > > One or more of these quantities arise under many different names > > Gini index (but NB that many other measures have also been > called that) > > Simpson index in ecology (the same Simpson as Simpson's paradox) > > > Herfindahl index in economics > > heterozygosity in genetics > > And no doubt others. > > Maarten gave one way to calculate it. Another is through -ineq- on > SSC. > > Nick > n.j.cox > > Maarten buis > > > --- Kevin Daley <kevin.daley@mail.mcgill.ca> wrote: > > > I would like to use a statistic discussed by Agresti in his > > > categorical data analysis book that gives the probability that > two > > > randomly selected independent observations in a given dataset > will > > > end up in different categories of the given variable. The > > > statistic has a minimum value of 0 and a maximum value of J-1. > > --- Maarten buis <maartenbuis@yahoo.co.uk> wrote: > > If it is a probability than the maximum is 1. In that case you > could > > compute it as follows: > > > > *---------- begin example ------------- > > sysuse auto, clear > > preserve > > contract rep78 , percent(p) nomiss > > gen double psq = (p/100)^2 > > sum psq, meanonly > > di 1-r(sum) > > restore > > *--------- end example ----------------- > > (For more on how to use examples I sent to the Statalist, see > > http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html ) > > In the case above the two draws are draws with replacement, in which > case the maximum is 1-1/_N. The maximum variability is obtained when > each observation is in its own category, so there are _N categories > each with a probability of 1/_N. The probability of drawing the one > particular category twice is (1/_N)^2, and there are _N such > categories, so the probability of drawing a category twice is > _N*(1/_N)^2 is 1/_N. the probability of not drawing a category twice > is > 1-1/_N. > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > ----------------------------------------- Maarten L. Buis Department of Social Research Methodology Vrije Universiteit Amsterdam Boelelaan 1081 1081 HV Amsterdam The Netherlands visiting address: Buitenveldertselaan 3 (Metropolitan), room Z434 +31 20 5986715 http://home.fsw.vu.nl/m.buis/ ----------------------------------------- __________________________________________________________ Sent from Yahoo! Mail. A Smarter Inbox. http://uk.docs.yahoo.com/nowyoucan.html * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

