> -----Original Message-----
> From: Yi, Bingsheng [mailto:[email protected]] 
> Sent: Tuesday, June 18, 2002 7:11 PM
> To: [email protected]
> Subject: st: how to generate groups based on some 
> characteristics and obtain the mean/median value for each group
> 
> 
> Dear Statalisters,
> 
> I wonder whether you will help me figure out the codes to solve the
> following problem:
> 
> I have  12 years panel data containing these four variables: 
> Tobin's q,
> size, 4-digit industry code (ind4), and id. For each year, I 
> want to make
> some adjusments in one variable (Tobin's q) based on the 
> other two variables
> (industry and size). First I need to ensure that there are 
> lat least 10
> firms within each industry. If the number of firms within a 
> 4-digit industry
> code is less than 10, I use 3-digit industry code generated 
> by gen str4
> ind3=substr(ind4,1,3), see whether the number of firms with 
> the same 3-digit
> industry code is greater or equal to 10, if not, then generate and use
> 2-digit  industry code. So in the end there are at least 10 
> firms within an
> industry ( which are classified by 4-digit, 3-digit, 2-digit, 
> or 1-digit
> industry code). The  problem is how to get and record the 
> number of firms in
> each industry.
For this piece, try something like this.  First, generate four variables
indicating the 4, 3, 2 and 1-digit industry codes for **ALL** records,
named ind1, ind2, ind3, ind4.  Then:
	* generate the number of records in each group
	forval i=1/4 {
		sort ind`i'
		by ind`i': gen num`i'=_N
	}
	* group the records
	gen finalgrp = ind1
	forval i=2/4 {
		replace finalgrp = ind`i' if num`i'>=10
	}
This should create the variable "finalgrp", which will contain the
grouping you desire.  THen you can calculate whatever statistics you
want in those groups.
Nick Winter
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/