Hello all,
I have the following dataset:
DV v1 v2 v3
1 A1 A2
1 B2 A1 C1
1 B1
1 B1
2 A1
2 A1 A2
3 B1 B2 B3
4 C1 B2 B3
4 B1 C1 C2
5 B1 B2 A1
5 C1 A2
I want to calculate a measure similar to the Herfindl-Hirshman Index
(HHI). For example, for 1 of DV, there are 7 strings (A1, A2; B2, A1,
C1; B1; B1) in 3 variables (v1, v2, v3), associated with DV-1. The HHI
measure = sum(square(# of each string)) / square(sum(# of each
string)). Therefore, For DV-1, because there are 2 A1's, 1 A2, 2 B1's,
1 B2, and 1 C1, HHI=(2^2+1^2+2^2+1^2+1^2) / (2+1+2+1+1)^2=11/49.
What I did is to reshape the dataset as follows and then use -gen- and
-egen-. Is there any way to do this without reshaping the
dataset--that is, is there any better way to calculate the measure
across variables and observations? Thanks in advance. Aaron
DV v_all
1 A1
1 B2
1 B1
1 B1
2 A1
2 A1
3 B1
4 C1
4 B1
5 B1
5 C1
1 A2
1 A1
1
1
2
2 A2
3 B2
4 B2
4 C1
5 B2
5 A2
1
1 C1
1
1
2
2
3 B3
4 B3
4 C2
5 A1
5
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/