Andreas sent me a .pdf with a definition. In essence
this measure is an approximation to what the document he sent
(modulo typos) calls a Hirschman-Herfindahl index (hence the notation
HHI) when given aggregate data. Incidentally, as often
aired on this list, linking Hirschman with Herfindahl
in this way is not especially justified. Hirschman and Herfindahl
used different formulas, and Herfindahl re-invented a measure
already published by Gini, Simpson and others before (and by Turing
in then-classified work), subject to secondary details such
as the use of a complement or reciprocal. Oh well.
As I understand it, the definition starts with variables for
businesses in size classes, which we will call
S turnover
E employment
EM employment threshold
N number of businesses
Here is some untested code. The variables should be supplied
in the order given above. I am not totally confident that
this is exactly right, but it is clear that although the formula
is messy, there is absolutely no need for explicit loops.
Thus, this, or a corrected version of it, should go pretty fast.
* Statalist offering 6 Feb 2007
program ave_hhi, rclass
version 8
syntax varlist(min=4 max=4 numeric) [if] [in]
marksample `touse'
quietly {
count if `touse'
if r(N) == 0 error 2000
tokenize `varlist'
args S E EM N
preserve
keep if `touse'
tempvar A B nexpand id j HHI
tempname Stotal
su `S', meanonly
scalar `Stotal' = r(sum)
gen double `A' = `S' / `E' * `EM'
gen double `nexpand' = `N' + 1
gen long id = _n
expand `nexpand'
bysort `id' : gen long `j' = _n - 1
gen double `B' = ///
(2 * `j' * (`S' - `N' + (`S' / `E' * `EM'))) / (`N' * (`N' - 1))
replace `B' = 0 if mi(`B')
gen double `HHI' = (100 * (`A' + `B') / `Stotal')^2
su `HHI', meanonly
}
di as txt "Schmalensee AVE_HHI " as res r(sum)
return scalar AVE_HHI = r(sum)
end
Nick
n.j.cox@durham.ac.uk
Nick Cox
> A visceral reaction is that this still looks much more
> complicated than it need be. And the speed sounds horrendous
> for what is presumably a descriptive statistic. Loosely
> similar indices tend
> to take perhaps 3-7 lines of Stata code and to take somewhere between
> a blink and a wink in terms of time.
>
> No one has suggested an alternative, for which there could be
> several explanations. I don't feel strong enough to try to reverse
> engineer the recipe from your code, and a quick Google did not reveal
> a single obvious source for this index. If you can give
> a precise definition or a precise reference for such, some
> Stata programmers
> _may_ be better equipped to advise.
Andreas Reinstaller
> > I have found a solution that does not strike me to be particularly
> > elegant, but that works. I just use the indices of the variables in
> > Stata, where NoEnt_nsc1 TO_nsc1 TO_Nace NoEmpl_nsc1 LET are the
> > variables I use:
> >
> > In a simple do file I have this code
> > --------------------------------------------------begin calc
> > ----------------------------------
> > local size=_N
> >
> > /* Ave_HHI nsc1 */
> >
> > sort country time SecCode sizeclass
> > generate ntx=_n
> > generate newntx=.
> > local i = 1
> > forvalues i = 1(1)`size' {
> >
> > local nof = NoEnt_nsc1[`i']
> > if `nof'==. local nof = 0
> > local szs = 0
> >
> > if `nof' > 0 {
> > if `nof' == 1 {
> > local szs =
> > (NoEnt_nsc1[`i']*(100*(TO_nsc1[`i']/NoEnt_nsc1[`i'])/TO_Nace[`
> > i'] )^2)
> > }
> > else{
> > local j = 0
> > forvalues j = 1(1)`nof' {
> > local sz = 0
> > local sz =
> >
> ((100*((TO_nsc1[`i']/NoEmpl_nsc1[`i']*LET[`i'])+(2*`j'*(TO_nsc1[`i']
> > -(NoEnt_nsc1[`i']*(TO_nsc1[`i']/NoEmpl_nsc1[`i']*LET[`i'])))/(
> > NoEnt_nsc1[`i']*(NoEnt_nsc1[`i']-1))))/
> > TO_Nace[`i'])^2)
> > local szs = `szs' + `sz'
> > }
> > }
> > }
> > qui by country time SecCode sizeclass: replace newntx =
> `szs' if
> > ntx==`i'
> > }
> >
> > by country time SecCode: egen Ave_HHI_nsc1=sum(newntx)
> >
> >
> ---------------------------------------------------------------- end
> > calc --------------------------------------------
> >
> > For 12000 obs it takes about 15 minutes to run on a dual
> core pentium
> > with 2.8ghz and Windows XP in Stata 8.2.
> >
> > If somebody has ideas about how to improve the speed, I
> > should be grateful.
>
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/