Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Saving percentage distribution


From   zt22@cornell.edu
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Saving percentage distribution
Date   Tue, 3 Dec 2002 14:14:57 -0500 (EST)

Nick,

Sorry I did not describe the data. The two vars are part of a huge 
dataset that has more than 100,000 observations. What I really want to do 
is to use the percentages as weights to adjust for regression 
coefficients. That is, I ran a regression on logincome with about 70 
independent vars, 52 of which are dummies for industry. I save the 
coefficients for these dummies as b1-b52 and then obtain the percentage 
for each industry as p1-p52. The final product I want is the standard 
deviation of the industry effects calculated by:
let i=1/52
egen mubar=sum(b`i' * p`i')
egen variance=sum(p`i' * ((b`i'- mubar)^2) ) 
gen sd=sqrt(variance)

I can get p`i' by counting the N for the whole sample and then counting 
N`i' for each industry so that p`i'=N`i'/N. But this takes a lot of time 
becuase I need to generate 52 dummy variables. I am wondering if there is 
a faster way of doing this. Thanks very much.

Best,
Zun 
 

On Tue, 3 Dec 2002, Nick Cox wrote:

> Zun 
> > 
> > I have two vars ind (52 categories) and occ (7 categories), 
> > and I want
> > the percentage distribution of ind for each category of 
> > occ. Note that 
> > not each ind category has cases. For instance:
> > 
> > Occ=1
> > ind     pct
> > 1       .0309522
> > 2       .0334331
> > 3	0
> > 4	.0356777
> > 5       .3402772
> > 6       .0294558
> > .       .
> > .       .
> > 52      .3151532
> > 
> > Occ=2
> > ind     pct
> > 1       .0036623
> > 2       .0006301
> > 3	0
> > 4       .0064976
> > 5	0
> > 6       .0455619
> > .       .
> > .       .
> > 52      .0953769
> > 
> > As shown above, ind=3 is not in both occ=1 and occ=2 while 
> > ind=5 is in
> > occ=1 but not in occ=2. 
> > 
> > My questions are:
> > 
> > First, if I use tabulate to get the percentage distribution of any 
> > categorical variable, how can I save the percentages in a 
> > new dataset 
> > that looks like one of the tables above. 
> >  
> > Second, in the specific example above, is there a way I can 
> > create a new 
> > dataset that looks like this:
> > 
> > ind     pctocc1         pctocc2
> > 1       .0309522        .0036623
> > 2       .0334331        .0006301
> > 3       0               0
> > 4       .0356777        .0064976
> > 5       .3402772        0
> > 6       .0294558        .0455619
> > .       .               .
> > .       .               .
> > 52      .3151532        .0953769
> > 
> 
> I guess that you have at most 52 * 7 observations. 
> Forget -tabulate-: a direct calculation is better. 
> 
> Typing 
> 
> . findit percent
> 
> does point to lots of things; but one pertinent is -egen-. 
> 
> . bysort occ : egen pctocc = pc(ind)
> 
> followed by a -reshape- may help. You may need 
> to -replace- any missings by 0. 
> 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index