Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: _N in by-groups


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: _N in by-groups
Date   Fri, 19 Aug 2011 04:29:39 -0500

On Aug 19, 2011, at 2:41 AM, Matthew White wrote:
> But in this (admittedly silly) example, _N seems to be the number of observations in the data set:
> program dispby, byable(recall)
> disp `0'
> end
> sysuse auto
> bys foreign: dispby _N
> Both times 74 is displayed, instead of 52 in the first by-group and 22 in the second.


As Nick just pointed out, in this example, the string "_N" is being passed to your program, so that within -dispby-, the line

    disp `0'

is being expanded to

    disp _N

which evaluates to (and displays) 74 in all cases.  But this minor issue is a red herring here, for I suspect you would be equally surprised by


    program dispby, byable(recall)
        syntax [if]
        marksample touse
        count if `touse'
    end

    sysuse auto
    bys foreign: dispby if _n == _N


Or, for an example using official commands (also with the auto dataset), compare

    bys foreign: li make if _n<=5

to

    bys foreign: reg mpg weight if _n<=5

The issue you have stumbled across is that merely using -byable()- when you define a program does not automatically mean that Stata interprets _n and _N WRT the by-groups when you call the program.  If you want that type of behavior, then you have to program it explicitly.

Now, your original question had to do with efficiency; namely, you were concerned about calling -preserve- and -restore- for each by-group.  While it's true that there may be faster ways to accomplish what you're trying to do, it's usually better to wait until you know you have a problem (e.g., via profiling your command under a range of conditions) before doing a lot of extra work to try to improve performance.  Nonetheless, I suspect that the answer may be to use -byable(onecall)- and to call one or more of Stata's built-in commands to handle the by-group processing.  However, without knowing exactly what you are trying to do, it's not possible to give a specific recommendation.


-- Phil


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index