Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Herfindahl, segregation index


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Herfindahl, segregation index
Date   Wed, 26 Jan 2011 18:03:44 +0000

I just corrected the calculation in my example. Sorry again. 

You're naturally correct about weights. The original poster said nothing about weights, and your advice on that is key. 

And I naturally agree on your last point. 

We are not differing about the best way to do it, which cannot be established from the maximally vague description "a set of data" in the original post. But we are having fun,  even if no one else is. 

Nick 
n.j.cox@durham.ac.uk 

Austin Nichols

Nick--
Are you using the frequency weights in your example? Also note that
-contract- does not allow pweights used on survey data, or aweights on
summary data.  My example  calculates HHI on the original data; the
-collapse- near the end of the code is only to show the connection to
the user-written program the poster asked about.  If one is intending
to use group- or region-level HHI or the like in a regression, say,
it's much more efficient not to collapse the data and have to merge it
back on.

On Wed, Jan 26, 2011 at 12:38 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:

> Here's a dopey example. I'm going to treat -rep78- in the auto data as a categorical variable and calculate Simpson (= Gini) diversity, which I define just as the sum of squared proportions. To add a bit of a challenge, I'll do that separately by -foreign-.
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> I am fond of using -contract- to reduce to a dataset of frequencies, rather than -collapse-, but -collapse- would do the job too.
>
> . contract foreign rep78, nomiss
>
> . l
>
>     +--------------------------+
>     | rep78    foreign   _freq |
>     |--------------------------|
>  1. |     1   Domestic       2 |
>  2. |     2   Domestic       8 |
>  3. |     3   Domestic      27 |
>  4. |     4   Domestic       9 |
>  5. |     5   Domestic       2 |
>     |--------------------------|
>  6. |     3    Foreign       3 |
>  7. |     4    Foreign       9 |
>  8. |     5    Foreign       9 |
>     +--------------------------+
>
> . ineq rep78, by(foreign) gensim(simpson)
>
> ----------------------------------------------------------
>  Car type |       freq     Simpson     entropy     dissim.
> ----------+-----------------------------------------------
>  Domestic |          5       0.244       1.490       0.200
>  Foreign |          3       0.347       1.078       0.083
> ----------------------------------------------------------
>
> . l
>
>     +-------------------------------------+
>     | rep78    foreign   _freq    simpson |
>     |-------------------------------------|
>  1. |     1   Domestic       2   .2444445 |
>  2. |     2   Domestic       8   .2444445 |
>  3. |     3   Domestic      27   .2444445 |
>  4. |     4   Domestic       9   .2444445 |
>  5. |     5   Domestic       2   .2444445 |
>     |-------------------------------------|
>  6. |     3    Foreign       3   .3472222 |
>  7. |     4    Foreign       9   .3472222 |
>  8. |     5    Foreign       9   .3472222 |
>     +-------------------------------------+
>
> . collapse simpson, by(foreign)
>
> . l
>
>     +---------------------+
>     |  foreign    simpson |
>     |---------------------|
>  1. | Domestic   .2444445 |
>  2. |  Foreign   .3472222 |
>     +---------------------+
>
> You must install -ineq- from SSC first.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Nick Cox
>
> -ineq- (SSC) will work on what are here called unit-record data.
>
> You just need to -contract- first.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Austin Nichols
>
> Tomeka Davis <soctmd@langate.gsu.edu>:
> I had a look at -seg- and it does not seem to support weights, and it
> does not operate on unit record data, so you would have to -collapse-
> or otherwise modify an individual-level dataset (probably using
> weights) to prepare it for -seg-. -seg- seems to be designed mostly
> for use on US Census tract- or block-level data. If you had
> tract-level data with shares already defined as variables, the HHI
> would be computed with a single call to -generate- e.g.
> . gen hhi=white^2+black^2+other^2
> so I assume you don't have that simple situation. Here is an example
> that demonstrates the closest parallel of the output of -seg- to HHI:
>
> webuse nhanes2, clear
> * pretend data is unweighted
> ta race
> qui levelsof race, loc(vs)
> qui foreach v of loc vs {
>  egen sh`v'=mean(race==`v'), by(region smsa)
>  replace sh`v'=sh`v'^2
>  la var sh`v' "sq. share race==`v'"
>  }
> su sh*
> egen hhi=rowtotal(sh*)
> bys region smsa:g two=(_n>1)
> li region smsa sh* hhi if two==0, noo sepby(region)
> * stop pretending data is unweighted
> egen gp=group(region smsa)
> qui levelsof gp, loc(gs)
> qui foreach v of loc vs {
>  tempvar vi
>  g `vi'=race==`v'
>  g ws`v'=.
>  la var ws`v' "wtd. sq. share race==`v'"
>  foreach g of loc gs {
>  su `vi' if gp==`g' [aw=finalwgt], mean
>  replace ws`v'=r(mean)^2 if gp==`g'
>  }
>  }
> egen whhi=rowtotal(ws*)
> li region smsa hhi whhi if two==0, noo sepby(region)
> g white=race==1
> collapse white black orace hhi whhi ws? sh? [pw=finalwgt], by(region smsa gp)
> g norm=(1-whhi)*3/2
> qui seg white black orace, by(gp) gen(i indx) p
> li region smsa whhi norm indx, noo sepby(region)
>
> Note that if you need to use -seg- on unit-record data, you will first
> collapse, then run -seg-, then save under a new name, then go back to
> your original data and merge on the output.
>
> On Wed, Jan 26, 2011 at 10:44 AM, Austin Nichols
> <austinnichols@gmail.com> wrote:
>> Tomeka Davis <soctmd@langate.gsu.edu> :
>> If you want the HHI, calculate the sum of squared shares directly,
>> perhaps using -egen- or -by- a couple of times, but if you want to use
>> the user-written -seg- on SSC you should check out its references,
>> particularly the 2002 paper by the same author:
>>
>> James, David R. and Karl E. Taeuber. 1985. "Measures of segregation."
>>      Sociological Methodology 14:1-32
>> Massey, Douglas S. and Nancy A. Denton. 1988. "The dimensions of racial
>>      segregation." Social Forces 67:281-315.
>> Reardon, Sean F., and Glenn Firebaugh. 2002. "Measures of multigroup
>>      segregation."  Sociological Methodology 32: 33-67.
>> White, Michael J. 1986. "Segregation and diversity measures in population
>>      distribution." Population Index 52:198-221.
>> Zoloth, Barbara S. 1976. "Alternative measures of school segregation." Land
>>      Economics 52:278-298.
>>
>> On Wed, Jan 26, 2011 at 9:23 AM, Tomeka Davis <soctmd@langate.gsu.edu> wrote:
>>> Hello -
>>>
>>> I would like to compute a racial segregation index for a set of data.  I know -seg- will allow me to do this, but I am not clear on which of the indices computed by -seg- is similar to the Herfindahl.  I would appreciate any advice.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index