Zun
>
> I have two vars ind (52 categories) and occ (7 categories),
> and I want
> the percentage distribution of ind for each category of
> occ. Note that
> not each ind category has cases. For instance:
>
> Occ=1
> ind pct
> 1 .0309522
> 2 .0334331
> 3 0
> 4 .0356777
> 5 .3402772
> 6 .0294558
> . .
> . .
> 52 .3151532
>
> Occ=2
> ind pct
> 1 .0036623
> 2 .0006301
> 3 0
> 4 .0064976
> 5 0
> 6 .0455619
> . .
> . .
> 52 .0953769
>
> As shown above, ind=3 is not in both occ=1 and occ=2 while
> ind=5 is in
> occ=1 but not in occ=2.
>
> My questions are:
>
> First, if I use tabulate to get the percentage distribution of any
> categorical variable, how can I save the percentages in a
> new dataset
> that looks like one of the tables above.
>
> Second, in the specific example above, is there a way I can
> create a new
> dataset that looks like this:
>
> ind pctocc1 pctocc2
> 1 .0309522 .0036623
> 2 .0334331 .0006301
> 3 0 0
> 4 .0356777 .0064976
> 5 .3402772 0
> 6 .0294558 .0455619
> . . .
> . . .
> 52 .3151532 .0953769
>
I guess that you have at most 52 * 7 observations.
Forget -tabulate-: a direct calculation is better.
Typing
. findit percent
does point to lots of things; but one pertinent is -egen-.
. bysort occ : egen pctocc = pc(ind)
followed by a -reshape- may help. You may need
to -replace- any missings by 0.
Nick
n.j.cox@durham.ac.uk
