[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, Stata) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: output svymean |

Date |
Wed, 23 Jul 2003 11:30:10 -0500 |

Rita Luk <Rita_Luk@camh.net> is a SAS user needing to use Stata. She wants to create a dataset of tabulations, such as OBS NAME1 VALUE1 NAME2 VALUE2 COUNT PERCENT1 PERCENT2 1 sex female . . 50 0.25 . 2 sex male . . 150 0.75 . 3 sex female age old 25 0.50 0.33 4 sex female age young 25 0.50 0.20 5 sex male age old 50 0.33 0.67 6 sex male age young 100 0.67 0.80 about which she writes, > [...] this dataset would result from running two different tabulations. The > first two observations [...] come from a single variable tabulation of sex. > The subsequent 4 observations (3-6) [... result ...] from a cross > tabulation of the dichotomous variables age and sex. She adds, > In stata by using [...an inventive but convoluted procedure...] I can get > the percents, and with a little data manipulation the counts and names, but > I cannot get the values [...] There must be a quicker way. Problem is, Rita is still thinking SAS while using Stata. Introduction, one-way tabulations --------------------------------- To obtain a dataset containing the results of a one-way tabulation, here is one recipe: . sort <varname> . by <varname>: gen count = _N . by <varname>: keep if _n==_N . keep <varname> count Let's try it with the auto data: . sysuse auto, clear (1978 Automobile Data) . sort rep78 . by rep78: gen count=_N . by rep78: keep if _n==_N (68 observations deleted) . keep rep78 count . list rep78 count 1. 1 2 2. 2 8 3. 3 30 4. 4 18 5. 5 11 6. . 5 Note that we got the missing value. If we did not want that, we could now type . drop if missing(rep78) We also do not have the percent. That's easy enough to get: . gen sum = sum(count) . gen percent = count/sum[_N] Doing that with the dataset above (and leaving in the missings), we would get: . list rep78 count sum percent 1. 1 2 2 .027027 2. 2 8 10 .1081081 3. 3 30 40 .4054054 4. 4 18 58 .2432432 5. 5 11 69 .1486486 6. . 5 74 .0675676 If we did not want variable sum, we could drop it by typing . drop sum So the entire solution is . sort <varname> . by <varname>: gen count = _N . by <varname>: keep if _n==_N . keep <varname> count . gen sum = sum(count) . generate percent = count/sum[_N] . drop sum Introduction, two-way tabulations --------------------------------- Obtaining two-way tabulations is no more difficult that one-way tabulations: . sort <varname1> <varname2> . by <varname1> <varname2>: gen count = _N . by <varname1> <varname2>: keep if _n==_N . keep <varname1> <varname2> count The solution is the same, we just -sort- and -by- on more variables. Getting the two percentages is a little more work then in the one-way case: . by <varname1>: gen sum = sum(count) . by <varname1>: gen percent1 = count/sum[_N] . drop sum . sort <varname2> . by <varname2>: gen sum = sum(count) . by <varname2>: gen percent2 = count/sum[_N] . drop sum I follow the above procedure using the auto data, substituting rep78 for <varname1> and foreign for <varname2>. Before listing the data, I -sort rep78 foreign-: . list, clean rep78 foreign count percent1 percent2 1. 1 Domestic 2 1 .0384615 2. 2 Domestic 8 1 .1538462 3. 3 Domestic 27 .9 .5192308 4. 3 Foreign 3 .1 .1363636 5. 4 Domestic 9 .5 .1730769 6. 4 Foreign 9 .5 .4090909 7. 5 Domestic 2 .1818182 .0384615 8. 5 Foreign 9 .8181818 .4090909 9. . Domestic 4 .8 .0769231 10. . Foreign 1 .2 .0454545 Combining results ----------------- We now need to rename and append to get the results in the form Rita wants. Let's pretend I saved the results of each of the above steps as result1.dta and result2.dta: . use result1 . gen str name1 = "rep78" . rename rep78 value1 . rename percent percent1 . save result1, replace . use result2 . gen str name1="rep78" . gen str name2="foreign" . rename rep78 value1 . rename foreign value2 . save result2, replace . use result1 . append using result2 Oh yes, and Rita wanted the variables in the order name1, value1, name2, value2, count, percent1, percent2: . order name1 value1 name2 value2 count percent1 percent2 I did the above using my example and obtained: . list, clean name1 value1 name2 value2 count percent1 percent2 1. rep78 1 . 2 .027027 . 2. rep78 2 . 8 .1081081 . 3. rep78 3 . 30 .4054054 . 4. rep78 4 . 18 .2432432 . 5. rep78 5 . 11 .1486486 . 6. rep78 . . 5 .0675676 . 7. rep78 1 foreign Domestic 2 1 .0384615 8. rep78 2 foreign Domestic 8 1 .1538462 9. rep78 3 foreign Domestic 27 .9 .5192308 10. rep78 3 foreign Foreign 3 .1 .1363636 11. rep78 4 foreign Domestic 9 .5 .1730769 12. rep78 4 foreign Foreign 9 .5 .4090909 13. rep78 5 foreign Domestic 2 .1818182 .0384615 14. rep78 5 foreign Foreign 9 .8181818 .4090909 15. rep78 . foreign Domestic 4 .8 .0769231 16. rep78 . foreign Foreign 1 .2 .0454545 That was the desired result. Automation ---------- Rita wrote, > I plan on doing A LOT of single tabulations and cross tabulations and want > to produce an output dataset that might look like: so now let's automate it: program forrita syntax varlist(min=2 max=2) local v1 : word 1 of `varlist' local v2 : word 2 of `varlist' preserve sort `v1' by `v1': gen count = _N by `v1': keep if _n==_N keep `v1' count gen sum = sum(count) gen percent1 = count/sum[_N] drop sum save result1, replace restore, preserve sort `v1' `v2' by `v1' `v2': gen count = _N by `v1' `v2': keep if _n==_N keep `v1' `v2' count by `v1': gen sum = sum(count) by `v1': gen percent1 = count/sum[_N] drop sum sort `v2' by `v2': gen sum = sum(count) by `v2': gen percent2 = count/sum[_N] drop sum sort `v1' `v2' save result2, replace use result1 gen str name1 = "`v1'" rename `v1' value1 save result1, replace use result2 gen str name1 = "`v1'" gen str name2 = "`v2'" rename `v1' value1 rename `v2' value2 save result2, replace use result1 append using result2 order name1 value1 name2 value2 count percent1 percent1 restore, not erase result1.dta erase result2.dta end With the above program, I can type . sysuse auto, clear . forrita rep78 foreign and obtain the desired result. -- Bill wgould@stat.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: new schemes for bargraphs** - Next by Date:
**st: marginal effects** - Previous by thread:
**st: output svymean** - Next by thread:
**st: output svymean** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |