Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: _N in by-groups

From   Phil Schumm <>
Subject   Re: st: _N in by-groups
Date   Fri, 19 Aug 2011 04:29:39 -0500

On Aug 19, 2011, at 2:41 AM, Matthew White wrote:
> But in this (admittedly silly) example, _N seems to be the number of observations in the data set:
> program dispby, byable(recall)
> disp `0'
> end
> sysuse auto
> bys foreign: dispby _N
> Both times 74 is displayed, instead of 52 in the first by-group and 22 in the second.

As Nick just pointed out, in this example, the string "_N" is being passed to your program, so that within -dispby-, the line

    disp `0'

is being expanded to

    disp _N

which evaluates to (and displays) 74 in all cases.  But this minor issue is a red herring here, for I suspect you would be equally surprised by

    program dispby, byable(recall)
        syntax [if]
        marksample touse
        count if `touse'

    sysuse auto
    bys foreign: dispby if _n == _N

Or, for an example using official commands (also with the auto dataset), compare

    bys foreign: li make if _n<=5


    bys foreign: reg mpg weight if _n<=5

The issue you have stumbled across is that merely using -byable()- when you define a program does not automatically mean that Stata interprets _n and _N WRT the by-groups when you call the program.  If you want that type of behavior, then you have to program it explicitly.

Now, your original question had to do with efficiency; namely, you were concerned about calling -preserve- and -restore- for each by-group.  While it's true that there may be faster ways to accomplish what you're trying to do, it's usually better to wait until you know you have a problem (e.g., via profiling your command under a range of conditions) before doing a lot of extra work to try to improve performance.  Nonetheless, I suspect that the answer may be to use -byable(onecall)- and to call one or more of Stata's built-in commands to handle the by-group processing.  However, without knowing exactly what you are trying to do, it's not possible to give a specific recommendation.

-- Phil

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index