# Re: st: output svymean

 From wgould@stata.com (William Gould, Stata) To statalist@hsphsun2.harvard.edu Subject Re: st: output svymean Date Wed, 23 Jul 2003 11:30:10 -0500

```Rita Luk <Rita_Luk@camh.net> is a SAS user needing to use Stata.  She wants to
create a dataset of tabulations, such as

OBS  NAME1  VALUE1  NAME2  VALUE2  COUNT  PERCENT1  PERCENT2
1    sex  female      .       .     50      0.25         .
2    sex  male        .       .    150      0.75         .
3    sex  female    age     old     25      0.50      0.33
4    sex  female    age   young     25      0.50      0.20
5    sex  male      age     old     50      0.33      0.67
6    sex  male      age   young    100      0.67      0.80

> [...] this dataset would result from running two different tabulations. The
> first two observations [...] come from a single variable tabulation of sex.
> The subsequent 4 observations (3-6) [... result ...]  from a cross
> tabulation of the dichotomous variables age and sex.

> In stata by using [...an inventive but convoluted procedure...] I can get
> the percents, and with a little data manipulation the counts and names, but
> I cannot get the values [...]  There must be a quicker way.

Problem is, Rita is still thinking SAS while using Stata.

Introduction, one-way tabulations
---------------------------------

To obtain a dataset containing the results of a one-way tabulation, here
is one recipe:

. sort <varname>
. by <varname>: gen count = _N
. by <varname>: keep if _n==_N
. keep <varname> count

Let's try it with the auto data:

. sysuse auto, clear
(1978 Automobile Data)

. sort rep78

. by rep78: gen count=_N

. by rep78: keep if _n==_N
(68 observations deleted)

. keep rep78 count

. list

rep78   count
1.       1       2
2.       2       8
3.       3      30
4.       4      18
5.       5      11
6.       .       5

Note that we got the missing value.  If we did not want that, we could
now type

. drop if missing(rep78)

We also do not have the percent.  That's easy enough to get:

. gen sum = sum(count)
. gen percent = count/sum[_N]

Doing that with the dataset above (and leaving in the missings), we would
get:

. list

rep78   count   sum    percent
1.       1       2     2    .027027
2.       2       8    10   .1081081
3.       3      30    40   .4054054
4.       4      18    58   .2432432
5.       5      11    69   .1486486
6.       .       5    74   .0675676

If we did not want variable sum, we could drop it  by typing

. drop sum

So the entire solution is

. sort <varname>
. by <varname>: gen count = _N
. by <varname>: keep if _n==_N
. keep <varname> count
. gen sum = sum(count)
. generate percent = count/sum[_N]
. drop sum

Introduction, two-way tabulations
---------------------------------

Obtaining two-way tabulations is no more difficult that one-way tabulations:

. sort <varname1> <varname2>
. by <varname1> <varname2>: gen count = _N
. by <varname1> <varname2>: keep if _n==_N
. keep <varname1> <varname2> count

The solution is the same, we just -sort- and -by- on more variables.  Getting
the two percentages is a little more work then in the one-way case:

. by <varname1>:  gen sum = sum(count)
. by <varname1>:  gen percent1 = count/sum[_N]
. drop sum

. sort <varname2>
. by <varname2>:  gen sum = sum(count)
. by <varname2>:  gen percent2 = count/sum[_N]
. drop sum

I follow the above procedure using the auto data, substituting rep78 for
<varname1> and foreign for <varname2>.  Before listing the data, I
-sort rep78 foreign-:

. list, clean

rep78    foreign   count   percent1   percent2
1.       1   Domestic       2          1   .0384615
2.       2   Domestic       8          1   .1538462
3.       3   Domestic      27         .9   .5192308
4.       3    Foreign       3         .1   .1363636
5.       4   Domestic       9         .5   .1730769
6.       4    Foreign       9         .5   .4090909
7.       5   Domestic       2   .1818182   .0384615
8.       5    Foreign       9   .8181818   .4090909
9.       .   Domestic       4         .8   .0769231
10.       .    Foreign       1         .2   .0454545

Combining results
-----------------

We now need to rename and append to get the results in the form Rita wants.
Let's pretend I saved the results of each of the above steps as result1.dta
and result2.dta:

. use result1
. gen str name1 = "rep78"
. rename rep78 value1
. rename percent percent1
. save result1, replace

. use result2
. gen str name1="rep78"
. gen str name2="foreign"
. rename rep78 value1
. rename foreign value2
. save result2, replace

. use result1
. append using result2

Oh yes, and Rita wanted the variables in the order name1, value1, name2,
value2, count, percent1, percent2:

. order name1 value1 name2 value2 count percent1 percent2

I did the above using my example and obtained:

. list, clean

name1   value1     name2     value2   count   percent1   percent2
1.   rep78        1                    .       2    .027027          .
2.   rep78        2                    .       8   .1081081          .
3.   rep78        3                    .      30   .4054054          .
4.   rep78        4                    .      18   .2432432          .
5.   rep78        5                    .      11   .1486486          .
6.   rep78        .                    .       5   .0675676          .
7.   rep78        1   foreign   Domestic       2          1   .0384615
8.   rep78        2   foreign   Domestic       8          1   .1538462
9.   rep78        3   foreign   Domestic      27         .9   .5192308
10.   rep78        3   foreign    Foreign       3         .1   .1363636
11.   rep78        4   foreign   Domestic       9         .5   .1730769
12.   rep78        4   foreign    Foreign       9         .5   .4090909
13.   rep78        5   foreign   Domestic       2   .1818182   .0384615
14.   rep78        5   foreign    Foreign       9   .8181818   .4090909
15.   rep78        .   foreign   Domestic       4         .8   .0769231
16.   rep78        .   foreign    Foreign       1         .2   .0454545

That was the desired result.

Automation
----------

Rita wrote,

> I plan on doing A LOT of single tabulations and cross tabulations and want
> to produce an output dataset that might look like:

so now let's automate it:

program forrita
syntax varlist(min=2 max=2)
local v1 : word 1 of `varlist'
local v2 : word 2 of `varlist'

preserve

sort `v1'
by `v1': gen count = _N
by `v1': keep if _n==_N
keep `v1' count
gen sum = sum(count)
gen percent1 = count/sum[_N]
drop sum
save result1, replace

restore, preserve
sort `v1' `v2'
by `v1' `v2': gen count = _N
by `v1' `v2': keep if _n==_N
keep `v1' `v2' count
by `v1': gen sum = sum(count)
by `v1': gen percent1 = count/sum[_N]
drop sum
sort `v2'
by `v2': gen sum = sum(count)
by `v2': gen percent2 = count/sum[_N]
drop sum
sort `v1' `v2'
save result2, replace

use result1
gen str name1 = "`v1'"
rename `v1' value1
save result1, replace

use result2
gen str name1 = "`v1'"
gen str name2 = "`v2'"
rename `v1' value1
rename `v2' value2
save result2, replace

use result1
append using result2
order name1 value1 name2 value2 count percent1 percent1
restore, not

erase result1.dta
erase result2.dta
end

With the above program, I can type

. sysuse auto, clear
. forrita rep78 foreign

and obtain the desired result.

-- Bill
wgould@stat.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```