Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: how to generate groups based on some characteristics and obtain the mean/median value for each group


From   "Yi, Bingsheng" <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: RE: how to generate groups based on some characteristics and obtain the mean/median value for each group
Date   Wed, 19 Jun 2002 17:39:44 -0400

Special Thanks to N Cox and N Winter!!!

I've tried the codes provided by N Winter, it works but there is still a
problem. The codes cannot ensure that there are at least 10 firms within
each final group, I also tried other way, but the results are similar. I
can't figure out the reason and have to seek your helps again.

*generate the number of records in each group, ind4 is the 4-digit industry
code
gen str4 ind3=substr(ind4,1,3)
gen str4 ind2=substr(ind4,1,2)
gen str4 ind1=substr(ind4,1,1)
forval i=1/4 {
        sort ind`i'
        by ind`i': gen num`i'=_N
}

* group the records
gen str4 industry=ind1
drop if num1<10 * exclude an industry if it contains less than 10 firms*
forval i=2/4 {
        replace industry=ind`i' if num`i'>=10
}
sort industry
by industry: gen _freq=_N
list ind4 industry _freq if _freq<10

After running the above codes, there are still some industry with less than
10 firms
          ind4   industry      _freq
  41.      1044        104          1
  79.      1321         13          5
  80.      1330         13          5
  81.      1390         13          5
  82.      1320         13          5
  83.      1320         13          5
 282.      1610         16          2
 283.      1611         16          2
 284.      1622        162          9
 285.      1623        162          9
 286.      1623        162          9
 287.      1623        162          9
 288.      1623        162          9
 289.      1623        162          9
 290.      1623        162          9
 291.      1629        162          9
 292.      1623        162          9
 
There are 9 firms with 3-digit industry code as 162, if the 2-digit industry
code "16" have been used, then there would have been 11 firm within industry
16. I don't know why the above codes 
didn't do it as it's supposed to do.

I also try the following codes, it still does not solve the problem.
* group the records
gen str4 industry=ind1
drop if num1<10 * the 1-digit industry code should contain the largest
number of firms, if it's less than 10, such an industry shouldn't be
considered any more*

replace industry=ind3 if num4<10 & num3>10
replace industry=ind2 if num4<10 & num3<10 & num2>10
replace industry=ind1 if num4<10 & num3<10 & num2<10
tabulate industry * this is to the number of firms contained in each
industry*


. tabulate industry

   industry |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        234        5.10        5.10
         10 |         22        0.48        5.57
        104 |          1        0.02        5.60
         13 |          5        0.11        5.71
         15 |         11        0.24        5.95
        152 |         18        0.39        6.34
         16 |          2        0.04        6.38
        162 |          9        0.20        6.58
         26 |         20        0.44       19.82
        262 |          2        0.04       19.86
        267 |         19        0.41       20.27
         27 |         37        0.81       21.08
        271 |          1        0.02       21.10
        275 |         17        0.37       21.47

For firms in industry 262, they can go to industry 26, but why the codes
didn't do this?

More importantly, I wonder whether you could give me some ideas for the
consequent problem:

Suppose finally each industry contains at least 10 firms, I  want to
subdivide firms in each industry into three groups based on size: the small,
middle and large groups. 

The small group contains the smallest 30% firms in size within an industry,
the middle group contains the middle 40% firms in size (30% to 70%), and the
large group contain firms whose size belongs to the largest 30% in that
industry. How I can subdivide firms according to size in an industry? I need
to get the median or mean value of Tobin's q for each of these groups within
an industry. I also need to get the size range for each group,the problem is
how to get and record them???

Next I need to decide which group a firm belongs to in an industry based on
its size. If in industry 10, the size range in small group is 12 to 20, and
the size of a firm in industry 10 is 15, then  the industry-size adjusted
Tobin's q	= The Tobin's q of the firm in industry 10 - the mean/median
value of the small group in industry 10.

I'm sorry to trouble you again, I greatly appreciate you helps and am
looking forward to your reply!!!

Bing
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index