Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?
Date   Tue, 25 Sep 2012 08:31:14 +0100

This is a simple application of -by:-, with which all long-term Stata
users should be familiar.

bysort diagnosis group : keep if _N > 100

Note that this procedure just counts observations, and is indifferent
to missing values. If you have missing values on key variables, -drop-
them first.

Read the sections on -by:- in [U}. Then for a discursive tutorial on -by:-, see

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step by: step
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q1/02   SJ 2(1):86--102                                  (no commands)
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
        advanced manipulations that use the built-in _n and _N


Nick

On Tue, Sep 25, 2012 at 7:53 AM, Caliph Omar Moumin
<[email protected]> wrote:
>
> I have a large dataset which more than 500,000 observations; and more than 7000 diagnoses,  which is grouped into two groups alcohol coded as "1" and nonlacloh as "0"
> the data structure is like this
>
> obs                id                                      diagnosis        group............other variables
>   1               2338                                     A120             1
>
>  2                3838                                     m23              0
> .
> .
> .
> .
>  500,000      45566                                    y678            1
>
>
> So i want to keep if observations is  >= 100 for both groups alcohol and nonalcohol based on daignoses. For example if daignoses A120 has more than 100 observations for both alcohol and nonalcohol keep if not drop it.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index