Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: new expression: how to generate groups based on some characteristics, and assign a sample into the correct group


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: new expression: how to generate groups based on some characteristics, and assign a sample into the correct group
Date   Fri, 21 Jun 2002 18:53:48 +0100

Yi, Bingsheng
>
> I need to assign a firm into an industry which should contain at least 10
> firms, and a firm should belong to an industry with the most number of
> digits of industry code, so long as that industry contains at least 10
> firms. If an industry has a code as 10, then it should contain
> all the firms
> with the same 2-digit industry code as 10. ind1`i' is the i-digit industry
> code, i=1,2,3,4. For example, ind4 is the 4-digit industry code. There are
> only one firm in industry 1028, industry 102, but 12 firms in industry 10,
> so the final industy code for firm 12 will be 10, and industry 10 should
> contain all the firms with ind2=10. The industry code for the
> first 10 firms
> is 1041,not 104,10, nor 1 since industry 1041 already contains 10 firms,
> even though industry 104,10,or 1 contain more than 10 firms. ind3grp
> contains all the firms with the same 3-digit industry code.
>
> obs q ind4 ind3 ind2 ind1  industry size ind4grp ind3grp ind2grp ind1grp
> 1     1041 104  10   1       1041    1   small     ?       ?
> 2     1041 104  10   1       1041    3   small     ?       ?
> 3     1041 104  10   1       1041    2   small     ?       ?
> 4     1041 104  10   1       1041    5   middle    ?       ?
> 5     1041 104  10   1       1041    4   middle    ?       ?
> 6     1041 104  10   1       1041    9   large     ?       ?
> 7     1041 104  10   1       1041    8   middle    ?       ?
> 8     1041 104  10   1       1041    11  large     ?       ?
> 9     1041 104  10   1       1041    10  large     ?       ?
> 10    1041 104  10   1       1041    7   middle    ?       ?
> 11    1044 104  10   1       104     6     N.A.    ?       ?
> 12    1028 102  10   1       10      18    N.A.    N.A.    ?
> .
> .
> Subsequently I will subdivide firms in each industry into small,
> middel and
> large groups according to firm size. The small group contains the smallest
> 30% firms in size within an industry, the middle group contains the middle
> 40% firms in size (30% to 70%), and the large group includes firms whose
> size
> belongs to the largest 30% in that industry. How I can subdivide firms
> according to size in an industry and assign a firm to a group according to
> its size? I also need to get the median or mean value of  q for each
> of these groups within an industry.
>
> Take firm 1 as an example, its ind4 is 1041, and there are
> already 10 firms
> with ind4 as 1041, so firm 1 should be in industry 1041. According to its
> size, it belongs to samll group in industry 1041. Firm 1's industry-size
> adjusted q = firm 1's q - the mean q of small group under
> industry 1041. As
> to firm 11, first I need to assign firm 11 into industry 104
> which contains
> all the firms with the same 3-digit industry code as 104, then I need to
> assign firm 11 into a group under industry 104 according to its
> size (a firm
> may belong to different groups under different industry codes),
> then I have
> to get the mean value of q of firms in the same group as firm
> 11.
>

Sorry, I can't keep track of where you are on this,
but it splits into three:

1. The classification into industry groups.

2. Lowest 30% , middle 40%, lowest 30% on size.

3. Group means and medians.

Earlier postings seem to have focused most on 1,
and at least partially solved it.

There was at least one suggestion earlier on 2. Another is to use
-egen, rank()- and -egen, count()- to work out lowest 30%, etc.
That is, some fraction based on rank and count tells you where a value
is in the distribution, and you can use -by:- to do it groupwise.

On 3, once you get there, the simplest answer is probably
-egen, mean()- or -egen, median()-.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index