Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Herfindahl, segregation index

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	st: RE: Herfindahl, segregation index
Date	Wed, 26 Jan 2011 15:47:48 +0000

-seg- is a user-written program on SSC. Please remember to explain whether user-written programs you refer to come from, a very longstanding request on this list. 

I haven't looked carefully, but I suspect that various user-written programs in the inequality/segregation/concentration/diversity territory will provide this calculation. 

Measures attributed to Herfindahl (usually in economics) and to Simpson (usually in ecology) are based on the sum of squared probabilities of group occurrence, but watch out: sometimes people subtract from 1 and sometimes people work with the reciprocal, which has a nice interpretation as an equivalent number of equally common classes. People knowing more about genetics may want to recognise heterozygosity under unfamiliar names. 

The reason for this explosion of terminology is that local re-inventions abound and most people don't read other disciplines' literature. The same basic idea can be traced back to Corrado Gini at least, long before Simpson or Herfindahl, although several different measures are known by his name, so a name like Gini index creates as many uncertainties as it removes. 

Although some people have heard this before, I always enjoy pointing out that the Simpson here is precisely the same person as in Simpson's paradox (which isn't his discovery either). 

Your best bet is to do a calculation yourself on simple data for which you know the answer and then see how it matches with the program of your choice. That way you can be confident that what a program gives is what you seek. 

Mata is a good vehicle for calculator-style checking. 

. mata :

: freq = (100, 100, 100)

: freq / sum(freq)
                 1             2             3
    +-------------------------------------------+
  1 |  .3333333333   .3333333333   .3333333333  |
    +-------------------------------------------+

: (freq / sum(freq)):^2
                 1             2             3
    +-------------------------------------------+
  1 |  .1111111111   .1111111111   .1111111111  |
    +-------------------------------------------+

: sum((freq / sum(freq)):^2)
  .3333333333

: 1 - sum((freq / sum(freq)):^2)
  .6666666667

: 1/sum((freq / sum(freq)):^2)
  3

If you have just one set of data, reading them into Mata will therefore give what you want directly, but much depends on your data structure. 

However, some existing program probably will do what you want. 

Nick 
[email protected] 

Tomeka Davis
 
I would like to compute a racial segregation index for a set of data.  I know -seg- will allow me to do this, but I am not clear on which of the indices computed by -seg- is similar to the Herfindahl.  I would appreciate any advice.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Herfindahl, segregation index
  - From: "Tomeka Davis" <[email protected]>

Prev by Date: Re: st: Herfindahl, segregation index
Next by Date: Re: st: extracting portions of a string variable using observations from another variable
Previous by thread: RE: st: Herfindahl, segregation index
Next by thread: st: Time between 2 dates
Index(es):
- Date
- Thread