[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: More Tab and Matrix |

Date |
Fri, 14 Jun 2002 10:21:25 +0100 |

Steven Fraser wrote > > > I am trying to calculate a focus index (I'm not sure the > Simpson was the > > > same). In any event, it is a simple measure. If I assume a car > > > manufacture makes 3 cars in 3 separate classes, they would > have a focus > > > measure of 1/3. (1^2 + 1^2 + 1^2)/3^2 = 3/9. On the other hand, if one > > > manufacturer makes 3 cars all in one class, the focus measure would be > > > (1+1+1)^2/3^2 = 9/9 or 1.00. > > > > > > If I drop all observations except for one manufacturer, the > > following code > > > works fine: > > > > > > /* This sequence calculates the focus variables */ > > > tab class, matcell(classmat) > > > mat nummat=catmat'*catmat > > > gen numerator=nummat[1,1] > > > gen denominator=(_N)^2 > > > gen focus=numerator/denominator > > > list focus > > > > > > My problem is I have many 'manufacturers'. When I try to > implement this > > > code with the "by:" syntax, I continue to have difficulties. I > > would like > > > to keep the data in this format/shape because I have generated several > > > other variables by 'manufacturer'. > > > > > > Any thoughts or suggestions are greatly appreciated. Thx again - SF > > > and I suggested > > You don't show us your code, but I guess the key problem is > that the -matrix- command can't be > used under by: in the way that you want. > > Let's suppose that your key variables are -manufacturer- and -class-. > > Try two commands in succession (-save- your data if necessary): > > contract manufacturer class, nomiss > ineq _freq, by(manufacturer) gensim(focus) > > I think that is what you want. > But this does not keep the same data structure. To do that in this way requires not -contract-, but explicit calculation of frequencies: bysort manuf class : gen _freq = _N * (_n == 1) But here we ensure that each frequency is recorded just once for the first observation in each group, for which _n is 1: otherwise the frequency is set to 0, and squaring those zeros will not change the sum to be calculated. (They will be divided by denominators, which again will effect no change.) ineq _freq, by(manuf) gensim(focus) All -ineq- (on SSC) does is to provide a wrapper for other such calculations, and doing the focus (Herfindahl-Simpson-Gini) index -- the sum of the squares of the proportional shares -- from first principles is also quite possible: bysort manuf class : gen _freq = _N * (_n == 1) by manuf: gen _totfreq = _N by manuf: gen focus = sum((_freq/_totfreq)^2) by manuf: replace focus = focus[_N] That would need further refinement if -class- were ever missing. Another way of doing it, from the results of -tabulate-, has already been discussed by Steven and Nick Winter. As Nick has just set that as an exercise, I will not interfere. Nick n.j.cox@durham.ac.uk P.S. there's a tutorial on -by:-, _n and _N manipulations in Stata Journal 2(1), 86-102 (2002). * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: More Tab and Matrix***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: OLS regression versus survival analysis** - Next by Date:
**st: London Stata users meeting May 2002** - Previous by thread:
**st: RE: More Tab and Matrix** - Next by thread:
**st: RE: Re: Formatting variables** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |