# st: RE: RE: how to generate groups based on some characteristics and obtain the mean/median value for each group

 From "Yi, Bingsheng" To "'statalist@hsphsun2.harvard.edu'" Subject st: RE: RE: how to generate groups based on some characteristics and obtain the mean/median value for each group Date Wed, 19 Jun 2002 17:39:44 -0400

```Special Thanks to N Cox and N Winter!!!

I've tried the codes provided by N Winter, it works but there is still a
problem. The codes cannot ensure that there are at least 10 firms within
each final group, I also tried other way, but the results are similar. I
can't figure out the reason and have to seek your helps again.

*generate the number of records in each group, ind4 is the 4-digit industry
code
gen str4 ind3=substr(ind4,1,3)
gen str4 ind2=substr(ind4,1,2)
gen str4 ind1=substr(ind4,1,1)
forval i=1/4 {
sort ind`i'
by ind`i': gen num`i'=_N
}

* group the records
gen str4 industry=ind1
drop if num1<10 * exclude an industry if it contains less than 10 firms*
forval i=2/4 {
replace industry=ind`i' if num`i'>=10
}
sort industry
by industry: gen _freq=_N
list ind4 industry _freq if _freq<10

After running the above codes, there are still some industry with less than
10 firms
ind4   industry      _freq
41.      1044        104          1
79.      1321         13          5
80.      1330         13          5
81.      1390         13          5
82.      1320         13          5
83.      1320         13          5
282.      1610         16          2
283.      1611         16          2
284.      1622        162          9
285.      1623        162          9
286.      1623        162          9
287.      1623        162          9
288.      1623        162          9
289.      1623        162          9
290.      1623        162          9
291.      1629        162          9
292.      1623        162          9

There are 9 firms with 3-digit industry code as 162, if the 2-digit industry
code "16" have been used, then there would have been 11 firm within industry
16. I don't know why the above codes
didn't do it as it's supposed to do.

I also try the following codes, it still does not solve the problem.
* group the records
gen str4 industry=ind1
drop if num1<10 * the 1-digit industry code should contain the largest
number of firms, if it's less than 10, such an industry shouldn't be
considered any more*

replace industry=ind3 if num4<10 & num3>10
replace industry=ind2 if num4<10 & num3<10 & num2>10
replace industry=ind1 if num4<10 & num3<10 & num2<10
tabulate industry * this is to the number of firms contained in each
industry*

. tabulate industry

industry |      Freq.     Percent        Cum.
------------+-----------------------------------
1 |        234        5.10        5.10
10 |         22        0.48        5.57
104 |          1        0.02        5.60
13 |          5        0.11        5.71
15 |         11        0.24        5.95
152 |         18        0.39        6.34
16 |          2        0.04        6.38
162 |          9        0.20        6.58
26 |         20        0.44       19.82
262 |          2        0.04       19.86
267 |         19        0.41       20.27
27 |         37        0.81       21.08
271 |          1        0.02       21.10
275 |         17        0.37       21.47

For firms in industry 262, they can go to industry 26, but why the codes
didn't do this?

More importantly, I wonder whether you could give me some ideas for the
consequent problem:

Suppose finally each industry contains at least 10 firms, I  want to
subdivide firms in each industry into three groups based on size: the small,
middle and large groups.

The small group contains the smallest 30% firms in size within an industry,
the middle group contains the middle 40% firms in size (30% to 70%), and the
large group contain firms whose size belongs to the largest 30% in that
industry. How I can subdivide firms according to size in an industry? I need
to get the median or mean value of Tobin's q for each of these groups within
an industry. I also need to get the size range for each group,the problem is
how to get and record them???

Next I need to decide which group a firm belongs to in an industry based on
its size. If in industry 10, the size range in small group is 12 to 20, and
the size of a firm in industry 10 is 15, then  the industry-size adjusted
Tobin's q	= The Tobin's q of the firm in industry 10 - the mean/median
value of the small group in industry 10.

I'm sorry to trouble you again, I greatly appreciate you helps and am