Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")? |

Date |
Tue, 25 Sep 2012 10:12:06 +0100 |

You did say that and I overlooked it. The extra code follows in turn from an FAQ and the principles discussed in my paper previously cited. FAQ . . . . . . Listing observations in a group that differ on a variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox 11/01 How do I list observations in a group that differ on a variable? http://www.stata.com/support/faqs/data/diff.html bysort diagnosis group : keep if _N > 100 by diagnosis : drop if group[1] == group[_N] If only one -group- is represented for each -diagnosis- then necessarily the first and last are the same. Nick On Tue, Sep 25, 2012 at 9:28 AM, Caliph Omar Moumin <sheikmoumin@yahoo.com> wrote: > Thank Nick for your quick reply > > I when i apply this command it is keeping if either of the two group is >= 100 observation. Which means there are cases which one of the groups have 0 observations > I would like if and only if both groups have >=100 observations. From: Nick Cox <njcoxstata@gmail.com> > Your title said ">="; your text varies between ">=" and "more than"; > clearly you need to choose between ">=" and ">". > > On Tue, Sep 25, 2012 at 8:31 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> This is a simple application of -by:-, with which all long-term Stata >> users should be familiar. >> >> bysort diagnosis group : keep if _N > 100 >> >> Note that this procedure just counts observations, and is indifferent >> to missing values. If you have missing values on key variables, -drop- >> them first. >> >> Read the sections on -by:- in [U}. Then for a discursive tutorial on -by:-, see >> >> SJ-2-1 pr0004 . . . . . . . . . . Speaking Stata: How to move step by: step >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >> Q1/02 SJ 2(1):86--102 (no commands) >> explains the use of the by varlist : construct to tackle >> a variety of problems with group structure, ranging from >> simple calculations for each of several groups to more >> advanced manipulations that use the built-in _n and _N >> >> >> Nick >> >> On Tue, Sep 25, 2012 at 7:53 AM, Caliph Omar Moumin >> <sheikmoumin@yahoo.com> wrote: >>> >>> I have a large dataset which more than 500,000 observations; and more than 7000 diagnoses, which is grouped into two groups alcohol coded as "1" and nonlacloh as "0" >>> the data structure is like this >>> >>> obs id diagnosis group............other variables >>> 1 2338 A120 1 >>> >>> 2 3838 m23 0 >>> . >>> . >>> . >>> . >>> 500,000 45566 y678 1 >>> >>> >>> So i want to keep if observations is >= 100 for both groups alcohol and nonalcohol based on daignoses. For example if daignoses A120 has more than 100 observations for both alcohol and nonalcohol keep if not drop it. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?***From:*Caliph Omar Moumin <sheikmoumin@yahoo.com>

**References**:**st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?***From:*Caliph Omar Moumin <sheikmoumin@yahoo.com>

**Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?***From:*Caliph Omar Moumin <sheikmoumin@yahoo.com>

- Prev by Date:
**st: RE: RE: RE: Residual diagnostics for panel data regression** - Next by Date:
**Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?** - Previous by thread:
**Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?** - Next by thread:
**Re: st: How to keep if freq of var is >= 100 for both group (Alcohol "1" and non-alcohol "0")?** - Index(es):