Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: RE: Sort Two-Way Tab


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: Sort Two-Way Tab
Date   Wed, 26 May 2010 18:54:07 +0200

<>

To have Stata restrict the output to the five most frequently occuring
values of mpg:


***********
sysuse auto, clear
bys mpg: gen int negcount=-_N
egen mygroups=group(negcount mpg)
tabsort mpg rep78 if mygroups<=5, nocsort
***********

How to deal with the fact that "22" and "25" are tied for fifth place in
terms of frequency for "mpg", I am not so sure...

HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Mittwoch, 26. Mai 2010 18:43
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: RE: Sort Two-Way Tab

-fre- (SSC) is an excellent program, but it is for one-way tables only. 

This need is met by -tabsort- from -tab_chi- on SSC. 

. tabsort mpg rep78, nocsort

   Mileage |                   Repair Record 1978
     (mpg) |         1          2          3          4          5 |
Total
-----------+-------------------------------------------------------+----
------
        18 |         1          2          2          2          2 |
9 
        19 |         0          0          7          0          0 |
7 
        14 |         0          1          2          2          0 |
5 
        21 |         0          0          3          2          0 |
5 
        25 |         0          0          1          3          1 |
5 
        16 |         0          1          2          1          0 |
4 
        17 |         0          1          1          0          2 |
4 
        22 |         0          1          2          1          0 |
4 
        24 |         1          2          0          1          0 |
4 
        20 |         0          0          3          0          0 |
3 
        23 |         0          0          1          2          0 |
3 
        28 |         0          0          1          2          0 |
3 
        12 |         0          0          2          0          0 |
2 
        15 |         0          0          1          1          0 |
2 
        30 |         0          0          0          1          1 |
2 
        35 |         0          0          0          0          2 |
2 
        26 |         0          0          1          0          0 |
1 
        29 |         0          0          1          0          0 |
1 
        31 |         0          0          0          0          1 |
1 
        34 |         0          0          0          0          1 |
1 
        41 |         0          0          0          0          1 |
1 
-----------+-------------------------------------------------------+----
------
     Total |         2          8         30         18         11 |
69 

Note that 

. findit table sort 

would have pointed to -tab_chi-. That is, you missed this because you
didn't search SSC via -findit-. -tabsort- requires Stata 8. 

Nick 
n.j.cox@durham.ac.uk 

Martin Weiss

"I have searched specifically through -tab1-, -tab2-, -tabstat-,
-table-,
and -estout-, with no clear resolution."

Suspiciously absent from your list is -ssc d fre- by Ben Jann...

Kyle Longest

I am using Stata SE 11.0. I have searched through the manuals, help,
and statalist and not found a direct solution to my issue, but
apologize in advance if there is a straightforward resolution.

Problem: I would like to create a cross-tabulation of two variables
that is displayed in the sorted (by frequency) of the row variable. I
have searched specifically through -tab1-, -tab2-, -tabstat-, -table-,
and -estout-, with no clear resolution.

Description and Example: using auto.dta.

Say I was interested in the relationship between miles per gallon
(mpg) and a car's repair record (rep78). There are numerous possible
mpg's so I decide to only be concerned with the 5 most common mpg's.
[I'd like to set the analytic validity/utility aside for a moment, as
in the applied problem the desired technique is more warranted, for
various reasons, than it may be in this example.] It is easy to get a
frequency distribution of mpg in this sorted order using tab with the
sort option:

. tab mpg, sort

    Mileage |
      (mpg) |      Freq.     Percent        Cum.
------------+-----------------------------------
         18 |          9       12.16       12.16
         19 |          8       10.81       22.97
         14 |          6        8.11       31.08
         21 |          5        6.76       37.84
         22 |          5        6.76       44.59
         25 |          5        6.76       51.35
[omitted]

But now I would like to have this same distribution (order)
cross-tabulated by the levels of rep78 (across the columns). As far as
I can tell there is no way to do this with a current command. -tab-
and its derivatives do not allow the sort option. For example, any
call to tab will produce the following, which abandons the frequency
sort order of mpg:

. tab mpg rep78

Mileage |             Repair Record 1978
  (mpg) |    1         2       3          4      5 |     Total
-----------+--------------------------------+----------
        12 |    0          0       2          0     0 |         2
        14 |    0          1       2          2     0 |         5
        15 |    0          0       1          1     0 |         2
        16 |    0          1       2          1     0 |         4
        17 |    0          1       1          0     2 |         4
        18 |    1          2       2          2     2 |         9
        19 |    0          0       7          0     0 |         7
        20 |    0          0       3          0     0 |         3
        21 |    0          0       3          2     0 |         5
[ommitted]

What I am hoping is that there is a command or method to produce the
following (along with the row/column %s which are omitted for
brevity):

. command mpg rep78, option

 Mileage |             Repair Record 1978
  (mpg) |    1         2          3          4      5     |     Total
-----------+---------------------------------------+----------
       18 |    1          2          2          2          2 |         9
       19 |    0          0          7          0          0 |         7
       14 |    0          1          2          2          0 |         5
       21 |    0          0          3          2          0 |         5
[ommitted]

I have considered some manual options (e.g., using the gen option with
-tab- and then compiling  a matrix with the appropriate frequencies
and percentages) but was hoping there may be a more straightforward
solution. [I also realize that this is primarily a convenience issue
as I could feasibly copy and paste the given cross tab into the
desired order but this task can be extremely time consuming if there
sort variable has many categories].


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index