Nick, Sorry I did not describe the data. The two vars are part of a huge dataset that has more than 100,000 observations. What I really want to do is to use the percentages as weights to adjust for regression coefficients. That is, I ran a regression on logincome with about 70 independent vars, 52 of which are dummies for industry. I save the coefficients for these dummies as b1-b52 and then obtain the percentage for each industry as p1-p52. The final product I want is the standard deviation of the industry effects calculated by: let i=1/52 egen mubar=sum(b`i' * p`i') egen variance=sum(p`i' * ((b`i'- mubar)^2) ) gen sd=sqrt(variance) I can get p`i' by counting the N for the whole sample and then counting N`i' for each industry so that p`i'=N`i'/N. But this takes a lot of time becuase I need to generate 52 dummy variables. I am wondering if there is a faster way of doing this. Thanks very much. Best, Zun On Tue, 3 Dec 2002, Nick Cox wrote: > Zun > > > > I have two vars ind (52 categories) and occ (7 categories), > > and I want > > the percentage distribution of ind for each category of > > occ. Note that > > not each ind category has cases. For instance: > > > > Occ=1 > > ind pct > > 1 .0309522 > > 2 .0334331 > > 3 0 > > 4 .0356777 > > 5 .3402772 > > 6 .0294558 > > . . > > . . > > 52 .3151532 > > > > Occ=2 > > ind pct > > 1 .0036623 > > 2 .0006301 > > 3 0 > > 4 .0064976 > > 5 0 > > 6 .0455619 > > . . > > . . > > 52 .0953769 > > > > As shown above, ind=3 is not in both occ=1 and occ=2 while > > ind=5 is in > > occ=1 but not in occ=2. > > > > My questions are: > > > > First, if I use tabulate to get the percentage distribution of any > > categorical variable, how can I save the percentages in a > > new dataset > > that looks like one of the tables above. > > > > Second, in the specific example above, is there a way I can > > create a new > > dataset that looks like this: > > > > ind pctocc1 pctocc2 > > 1 .0309522 .0036623 > > 2 .0334331 .0006301 > > 3 0 0 > > 4 .0356777 .0064976 > > 5 .3402772 0 > > 6 .0294558 .0455619 > > . . . > > . . . > > 52 .3151532 .0953769 > > > > I guess that you have at most 52 * 7 observations. > Forget -tabulate-: a direct calculation is better. > > Typing > > . findit percent > > does point to lots of things; but one pertinent is -egen-. > > . bysort occ : egen pctocc = pc(ind) > > followed by a -reshape- may help. You may need > to -replace- any missings by 0. > > > Nick > n.j.cox@durham.ac.uk > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

