Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: _N in by-groups |

Date |
Fri, 19 Aug 2011 09:45:52 +0100 |

The example makes clear why you made your otherwise very puzzling claim. There is no contradiction. In your second example, 1. you are running a program for each group of -foreign- 2. that program is in effect just "display the number of observations" and makes no reference to the -by:- group. There is dissociation is this second example, but not in the first example, which behaves as you expect. The dissociation arises because "_N" is passed as text to the program. _N is evaluated _within_ the program and not within the context of -by:-. So _N, as it were, never sees the -by:- and is not influenced by it. Declaring the program -byable(recall)- makes no difference here. It's a subtle example! Nick On Fri, Aug 19, 2011 at 8:41 AM, Matthew White <mwhite@poverty-action.org> wrote: > So if I execute: > sysuse auto > bys foreign: drop if _n == _N > Then two observations are dropped because _N is the number of > observations in the by-group. > > But in this (admittedly silly) example, _N seems to be the number of > observations in the data set: > program dispby, byable(recall) > disp `0' > end > sysuse auto > bys foreign: dispby _N > Both times 74 is displayed, instead of 52 in the first by-group and 22 > in the second. > > > On Fri, Aug 19, 2011 at 10:07 AM, Maarten Buis <maartenlbuis@gmail.com> wrote: >> On Fri, Aug 19, 2011 at 2:21 AM, Matthew White wrote: >>> If a Stata command has by-groups, it seems like _N is interpreted >>> sometimes as the number of observations in the by-group and sometimes >>> as the number of observations in the data set. >> >> If you use the -by :- prefix it is always defined as the number of >> observations within each by-group. Stata would be a pretty lousy >> program if such a scalar randomly changed meaning... >> >> If you want the total number of observations than I would just do: >> >> local Ntot = _N >> >> or inside a program that previously used -marksample-: >> >> count if `touse' >> local Ntot = r(N) >> >> later you do >> >> by <somevar> `touse' : <something using _N and `Ntot'> if `touse' >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: _N in by-groups***From:*mcross@exemail.com.au

**References**:**st: _N in by-groups***From:*Matthew White <mwhite@poverty-action.org>

**Re: st: _N in by-groups***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: _N in by-groups***From:*Matthew White <mwhite@poverty-action.org>

- Prev by Date:
**Re: st: _N in by-groups** - Next by Date:
**Re: st: _N in by-groups** - Previous by thread:
**Re: st: _N in by-groups** - Next by thread:
**Re: st: _N in by-groups** - Index(es):