Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: [Re: How to program a loop to calculate the value of an observation]


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: [Re: How to program a loop to calculate the value of an observation]
Date   Tue, 6 Feb 2007 12:40:20 -0000

Andreas sent me a .pdf with a definition. In essence
this measure is an approximation to what the document he sent 
(modulo typos) calls a Hirschman-Herfindahl index (hence the notation 
HHI) when given aggregate data. Incidentally, as often
aired on this list, linking Hirschman with Herfindahl 
in this way is not especially justified. Hirschman and Herfindahl
used different formulas, and Herfindahl re-invented a measure
already published by Gini, Simpson and others before (and by Turing
in then-classified work), subject to secondary details such
as the use of a complement or reciprocal. Oh well. 

As I understand it, the definition starts with variables for
businesses in size classes, which we will call 

S turnover
E employment
EM employment threshold
N number of businesses 

Here is some untested code. The variables should be supplied 
in the order given above. I am not totally confident that
this is exactly right, but it is clear that although the formula 
is messy, there is absolutely no need for explicit loops. 
Thus, this, or a corrected version of it, should go pretty fast. 

* Statalist offering 6 Feb 2007 
program ave_hhi, rclass
	version 8 
	syntax varlist(min=4 max=4 numeric) [if] [in] 

	marksample `touse'
	quietly { 
		count if `touse' 
		if r(N) == 0 error 2000

		tokenize `varlist' 
		args S E EM N 

		preserve 
		keep if `touse' 

		tempvar A B nexpand id j HHI 
		tempname Stotal 

		su `S', meanonly 
		scalar `Stotal' = r(sum) 

		gen double `A' = `S' / `E' * `EM' 
		gen double `nexpand' = `N' + 1 
		gen long id = _n 
		expand `nexpand' 
		bysort `id' : gen long `j' = _n - 1 

		gen double `B' = ///
(2 * `j' * (`S' - `N' + (`S' / `E' * `EM'))) / (`N' * (`N' - 1)) 
		replace `B' = 0 if mi(`B') 

		gen double `HHI' = (100 * (`A' + `B') / `Stotal')^2 
		su `HHI', meanonly 
	}

	di as txt "Schmalensee AVE_HHI " as res r(sum) 
	return scalar AVE_HHI = r(sum) 
end 
	
Nick 
[email protected] 

Nick Cox 

> A visceral reaction is that this still looks much more
> complicated than it need be. And the speed sounds horrendous 
> for what is presumably a descriptive statistic. Loosely 
> similar indices tend
> to take perhaps 3-7 lines of Stata code and to take somewhere between
> a blink and a wink in terms of time. 
> 
> No one has suggested an alternative, for which there could be 
> several explanations. I don't feel strong enough to try to reverse 
> engineer the recipe from your code, and a quick Google did not reveal 
> a single obvious source for this index. If you can give
> a precise definition or a precise reference for such, some 
> Stata programmers
> _may_ be better equipped to advise. 
 
Andreas Reinstaller
 
> > I have found a solution that does not strike me to be particularly 
> > elegant, but that works. I just use the indices of the variables in 
> > Stata, where NoEnt_nsc1 TO_nsc1 TO_Nace NoEmpl_nsc1 LET are the 
> > variables I use:
> > 
> > In a simple do file I have this code
> > --------------------------------------------------begin calc 
> > ----------------------------------
> > local size=_N
> > 
> > /* Ave_HHI nsc1 */
> > 
> > sort country time SecCode sizeclass
> > generate ntx=_n
> > generate newntx=.
> > local i = 1
> > forvalues i = 1(1)`size' {
> > 
> >     local nof = NoEnt_nsc1[`i']
> >     if `nof'==. local nof = 0
> >     local szs = 0
> > 
> >     if `nof' > 0 {
> >         if `nof' == 1 {
> >         local szs = 
> > (NoEnt_nsc1[`i']*(100*(TO_nsc1[`i']/NoEnt_nsc1[`i'])/TO_Nace[`
> > i'] )^2)   
> >         }
> >         else{       
> >             local j = 0
> >             forvalues j = 1(1)`nof' {
> >                 local sz = 0
> >                 local sz =  
> > 
> ((100*((TO_nsc1[`i']/NoEmpl_nsc1[`i']*LET[`i'])+(2*`j'*(TO_nsc1[`i'] 
> > -(NoEnt_nsc1[`i']*(TO_nsc1[`i']/NoEmpl_nsc1[`i']*LET[`i'])))/(
> > NoEnt_nsc1[`i']*(NoEnt_nsc1[`i']-1))))/ 
> > TO_Nace[`i'])^2)
> >                 local szs = `szs' + `sz'
> >             }
> >         }
> >     }
> >     qui by country time SecCode sizeclass: replace newntx = 
> `szs' if 
> > ntx==`i'
> > }
> > 
> > by country time SecCode: egen Ave_HHI_nsc1=sum(newntx)
> > 
> > 
> ---------------------------------------------------------------- end 
> > calc --------------------------------------------
> > 
> > For 12000 obs it takes about 15 minutes to run on a dual 
> core pentium 
> > with 2.8ghz and Windows XP in Stata 8.2.
> > 
> > If somebody has ideas about how to improve the speed, I 
> > should be grateful.
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index