Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Combinations of variables

 From Nick Cox To "statalist@hsphsun2.harvard.edu" Subject Re: st: Combinations of variables Date Tue, 4 Jun 2013 17:04:01 +0100

```It is perhaps pertinent to point out that basic Stata commands can get
you close:

bysort <varlist> : gen freq = _N
bysort <varlist> : gen tag = _n == 1
l <varlist> freq if tag

But then people often want to see percents, etc., to condition of -if-
and -in, etc., and so start to prefer a canned command.

Nick
njcoxstata@gmail.com

On 4 June 2013 16:34, Seliger  Florian <seliger@kof.ethz.ch> wrote:
> Thank you Nick, that helped a lot.
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Dienstag, 4. Juni 2013 16:21
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Combinations of variables
>
> There are several ways to get at this. One I like, for reasons easy to infer, is to use -groups- from SSC. The example here uses just two categorical variables, but having more variables is fine, just messier.
>
> . sysuse auto, clear
> (1978 Automobile Data)
>
> . groups foreign rep78
>
>   +------------------------------------+
>   |  foreign   rep78   Freq.   Percent |
>   |------------------------------------|
>   | Domestic       1       2      2.90 |
>   | Domestic       2       8     11.59 |
>   | Domestic       3      27     39.13 |
>   | Domestic       4       9     13.04 |
>   | Domestic       5       2      2.90 |
>   |------------------------------------|
>   |  Foreign       3       3      4.35 |
>   |  Foreign       4       9     13.04 |
>   |  Foreign       5       9     13.04 |
>   +------------------------------------+
>
> Note that -contract- would give you an easy answer, at the cost of destroying the dataset.
>
> Nick
> njcoxstata@gmail.com
>
> On 4 June 2013 15:14, Seliger  Florian <seliger@kof.ethz.ch> wrote:
>
>> I need to find the most frequent combinations of variables in my dataset.
>> There are 12 variables of interest each coded 0/1.
>>
>> Example:
>>
>> ID           var1       var2       var3 ..
>> 1             0             1             0
>> 2             0             0             1
>> 3             0             1             0
>> 4             1             1             1
>> 5             0             1             0
>> .
>> .
>> .
>>
>> In this example, the most frequent combination is var1=0, var2=1, var3=0 (for ID 1, 3, 5).
>>
>> At the moment, I have no idea how to find out the combinations for so many different cases automatically.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```