st: RE: RE: More Tab and Matrix

 From "Nick Cox" To Subject st: RE: RE: More Tab and Matrix Date Fri, 14 Jun 2002 10:21:25 +0100

```Steven Fraser wrote

> > > I am trying to calculate a focus index (I'm not sure the
> Simpson was the
> > > same).  In any event, it is a simple measure.  If I assume a car
> > > manufacture makes 3 cars in 3 separate classes, they would
> have a focus
> > > measure of 1/3. (1^2 + 1^2 + 1^2)/3^2 = 3/9. On the other hand, if one
> > > manufacturer makes 3 cars all in one class, the focus measure would be
> > > (1+1+1)^2/3^2 = 9/9 or 1.00.
> > >
> > > If I drop all observations except for one manufacturer, the
> > following code
> > > works fine:
> > >
> > > /*  This sequence calculates the focus variables */
> > > tab class, matcell(classmat)
> > > mat nummat=catmat'*catmat
> > > gen numerator=nummat[1,1]
> > > gen denominator=(_N)^2
> > > gen focus=numerator/denominator
> > > list focus
> > >
> > > My problem is I have many 'manufacturers'.  When I try to
> implement this
> > > code with the "by:" syntax, I continue to have difficulties.  I
> > would like
> > > to keep the data in this format/shape because I have generated several
> > > other variables by 'manufacturer'.
> > >
> > > Any thoughts or suggestions are greatly appreciated.  Thx again - SF
> > >

and I suggested
>
> You don't show us your code, but I guess the key problem is
> that the -matrix- command can't be
> used under by: in the way that you want.
>
> Let's suppose that your key variables are -manufacturer- and -class-.
>
> Try two commands in succession (-save- your data if necessary):
>
> contract manufacturer class, nomiss
> ineq _freq, by(manufacturer) gensim(focus)
>
> I think that is what you want.
>

But this does not keep the same data structure. To do that
in this way requires not -contract-, but explicit calculation
of frequencies:

bysort manuf class : gen _freq = _N * (_n == 1)

But here we ensure that each frequency is recorded just
once for the first observation in each group, for which
_n is 1: otherwise the frequency is set to 0, and squaring those
zeros will not change the sum to be calculated. (They will be
divided by denominators, which again will effect no change.)

ineq _freq, by(manuf) gensim(focus)

All -ineq- (on SSC) does is to provide a wrapper for
other such calculations, and doing the focus (Herfindahl-Simpson-Gini)
index -- the sum of the squares of the proportional shares --
from first principles is also quite possible:

bysort manuf class : gen _freq = _N * (_n == 1)
by manuf: gen _totfreq = _N
by manuf: gen focus = sum((_freq/_totfreq)^2)
by manuf: replace focus = focus[_N]

That would need further refinement if -class- were ever missing.

Another way of doing it, from the results of -tabulate-,
has already been discussed by Steven and Nick Winter. As Nick
has just set that as an exercise, I will not interfere.

Nick
n.j.cox@durham.ac.uk

P.S. there's a tutorial on -by:-, _n and _N manipulations in
Stata Journal 2(1), 86-102 (2002).

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```