Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Martin Weiss" <martin.weiss1@gmx.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: RE: Sort Two-Way Tab |

Date |
Wed, 26 May 2010 18:54:07 +0200 |

<> To have Stata restrict the output to the five most frequently occuring values of mpg: *********** sysuse auto, clear bys mpg: gen int negcount=-_N egen mygroups=group(negcount mpg) tabsort mpg rep78 if mygroups<=5, nocsort *********** How to deal with the fact that "22" and "25" are tied for fifth place in terms of frequency for "mpg", I am not so sure... HTH Martin -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Mittwoch, 26. Mai 2010 18:43 To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: Sort Two-Way Tab -fre- (SSC) is an excellent program, but it is for one-way tables only. This need is met by -tabsort- from -tab_chi- on SSC. . tabsort mpg rep78, nocsort Mileage | Repair Record 1978 (mpg) | 1 2 3 4 5 | Total -----------+-------------------------------------------------------+---- ------ 18 | 1 2 2 2 2 | 9 19 | 0 0 7 0 0 | 7 14 | 0 1 2 2 0 | 5 21 | 0 0 3 2 0 | 5 25 | 0 0 1 3 1 | 5 16 | 0 1 2 1 0 | 4 17 | 0 1 1 0 2 | 4 22 | 0 1 2 1 0 | 4 24 | 1 2 0 1 0 | 4 20 | 0 0 3 0 0 | 3 23 | 0 0 1 2 0 | 3 28 | 0 0 1 2 0 | 3 12 | 0 0 2 0 0 | 2 15 | 0 0 1 1 0 | 2 30 | 0 0 0 1 1 | 2 35 | 0 0 0 0 2 | 2 26 | 0 0 1 0 0 | 1 29 | 0 0 1 0 0 | 1 31 | 0 0 0 0 1 | 1 34 | 0 0 0 0 1 | 1 41 | 0 0 0 0 1 | 1 -----------+-------------------------------------------------------+---- ------ Total | 2 8 30 18 11 | 69 Note that . findit table sort would have pointed to -tab_chi-. That is, you missed this because you didn't search SSC via -findit-. -tabsort- requires Stata 8. Nick n.j.cox@durham.ac.uk Martin Weiss "I have searched specifically through -tab1-, -tab2-, -tabstat-, -table-, and -estout-, with no clear resolution." Suspiciously absent from your list is -ssc d fre- by Ben Jann... Kyle Longest I am using Stata SE 11.0. I have searched through the manuals, help, and statalist and not found a direct solution to my issue, but apologize in advance if there is a straightforward resolution. Problem: I would like to create a cross-tabulation of two variables that is displayed in the sorted (by frequency) of the row variable. I have searched specifically through -tab1-, -tab2-, -tabstat-, -table-, and -estout-, with no clear resolution. Description and Example: using auto.dta. Say I was interested in the relationship between miles per gallon (mpg) and a car's repair record (rep78). There are numerous possible mpg's so I decide to only be concerned with the 5 most common mpg's. [I'd like to set the analytic validity/utility aside for a moment, as in the applied problem the desired technique is more warranted, for various reasons, than it may be in this example.] It is easy to get a frequency distribution of mpg in this sorted order using tab with the sort option: . tab mpg, sort Mileage | (mpg) | Freq. Percent Cum. ------------+----------------------------------- 18 | 9 12.16 12.16 19 | 8 10.81 22.97 14 | 6 8.11 31.08 21 | 5 6.76 37.84 22 | 5 6.76 44.59 25 | 5 6.76 51.35 [omitted] But now I would like to have this same distribution (order) cross-tabulated by the levels of rep78 (across the columns). As far as I can tell there is no way to do this with a current command. -tab- and its derivatives do not allow the sort option. For example, any call to tab will produce the following, which abandons the frequency sort order of mpg: . tab mpg rep78 Mileage | Repair Record 1978 (mpg) | 1 2 3 4 5 | Total -----------+--------------------------------+---------- 12 | 0 0 2 0 0 | 2 14 | 0 1 2 2 0 | 5 15 | 0 0 1 1 0 | 2 16 | 0 1 2 1 0 | 4 17 | 0 1 1 0 2 | 4 18 | 1 2 2 2 2 | 9 19 | 0 0 7 0 0 | 7 20 | 0 0 3 0 0 | 3 21 | 0 0 3 2 0 | 5 [ommitted] What I am hoping is that there is a command or method to produce the following (along with the row/column %s which are omitted for brevity): . command mpg rep78, option Mileage | Repair Record 1978 (mpg) | 1 2 3 4 5 | Total -----------+---------------------------------------+---------- 18 | 1 2 2 2 2 | 9 19 | 0 0 7 0 0 | 7 14 | 0 1 2 2 0 | 5 21 | 0 0 3 2 0 | 5 [ommitted] I have considered some manual options (e.g., using the gen option with -tab- and then compiling a matrix with the appropriate frequencies and percentages) but was hoping there may be a more straightforward solution. [I also realize that this is primarily a convenience issue as I could feasibly copy and paste the given cross tab into the desired order but this task can be extremely time consuming if there sort variable has many categories]. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Sort Two-Way Tab***From:*Kyle Longest <klongest4@gmail.com>

**st: RE: Sort Two-Way Tab***From:*"Martin Weiss" <martin.weiss1@gmx.de>

**st: RE: RE: Sort Two-Way Tab***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: RE: RE: Sort Two-Way Tab** - Next by Date:
**Re: st: RE: pattern-fills in stacked bar graph** - Previous by thread:
**st: RE: RE: Sort Two-Way Tab** - Next by thread:
**st: panel data -xttest2 and xtpcse modules** - Index(es):