# Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?

 From Caliph Omar Moumin
Date Tue, 25 Sep 2012 01:28:11 -0700 (PDT)

I when i apply this command it is keeping if either of the two group is >= 100 observation. Which means there are cases which one of the groups have 0 observations
I would like if and only if both groups have >=100 observations.

Thank you again Nick

Caliph
From: Nick Cox
Subject: Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?

Your title said ">="; your text varies between ">=" and "more than";
clearly you need to choose between ">=" and ">".

On Tue, Sep 25, 2012 at 8:31 AM, Nick Cox wrote:
> This is a simple application of -by:-, with which all long-term Stata
> users should be familiar.
> bysort diagnosis group : keep if _N > 100
> Note that this procedure just counts observations, and is indifferent
> to missing values. If you have missing values on key variables, -drop-
> them first.
>
> Read the sections on -by:- in [U}. Then for a discursive tutorial on -by:-, see
> SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
>        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>        Q1/02  SJ 2(1):86--102                                  (no commands)
>        explains the use of the by varlist : construct to tackle
>        a variety of problems with group structure, ranging from
>        simple calculations for each of several groups to more
>        advanced manipulations that use the built-in _n and _N
> Nick
On Tue, Sep 25, 2012 at 7:53 AM, Caliph Omar Moumin wrote:
> <sheikmoumin@yahoo.com> wrote:
>> I have a large dataset which more than 500,000 observations; and more than 7000 diagnoses,  which is grouped into two groups alcohol coded as "1" and nonlacloh as "0"
>> the data structure is like this
>>
>> obs                id                                      diagnosis        group............other variables
>>  1              2338                                    A120            1
>>
>>  2                3838                                    m23              0
>> .
>> .
>> .
>> .
>>  500,000      45566                                    y678            1
>>
>>
>> So i want to keep if observations is  >= 100 for both groups alcohol and nonalcohol based on daignoses. For example if daignoses A120 has more than 100 observations for both alcohol and nonalcohol keep if not drop it.
