Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Combinations of variables |
Date | Tue, 4 Jun 2013 17:04:01 +0100 |
It is perhaps pertinent to point out that basic Stata commands can get you close: bysort <varlist> : gen freq = _N bysort <varlist> : gen tag = _n == 1 l <varlist> freq if tag But then people often want to see percents, etc., to condition of -if- and -in, etc., and so start to prefer a canned command. Nick njcoxstata@gmail.com On 4 June 2013 16:34, Seliger Florian <seliger@kof.ethz.ch> wrote: > Thank you Nick, that helped a lot. > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: Dienstag, 4. Juni 2013 16:21 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: Combinations of variables > > There are several ways to get at this. One I like, for reasons easy to infer, is to use -groups- from SSC. The example here uses just two categorical variables, but having more variables is fine, just messier. > > . sysuse auto, clear > (1978 Automobile Data) > > . groups foreign rep78 > > +------------------------------------+ > | foreign rep78 Freq. Percent | > |------------------------------------| > | Domestic 1 2 2.90 | > | Domestic 2 8 11.59 | > | Domestic 3 27 39.13 | > | Domestic 4 9 13.04 | > | Domestic 5 2 2.90 | > |------------------------------------| > | Foreign 3 3 4.35 | > | Foreign 4 9 13.04 | > | Foreign 5 9 13.04 | > +------------------------------------+ > > Note that -contract- would give you an easy answer, at the cost of destroying the dataset. > > Nick > njcoxstata@gmail.com > > On 4 June 2013 15:14, Seliger Florian <seliger@kof.ethz.ch> wrote: > >> I need to find the most frequent combinations of variables in my dataset. >> There are 12 variables of interest each coded 0/1. >> >> Example: >> >> ID var1 var2 var3 .. >> 1 0 1 0 >> 2 0 0 1 >> 3 0 1 0 >> 4 1 1 1 >> 5 0 1 0 >> . >> . >> . >> >> In this example, the most frequent combination is var1=0, var2=1, var3=0 (for ID 1, 3, 5). >> >> At the moment, I have no idea how to find out the combinations for so many different cases automatically. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/