help byprog, help byable
-------------------------------------------------------------------------------
Title
[P] byable -- Make programs byable
Syntax
program [define] program_name [, ... byable(recall[,noheader] |
onecall) sortpreserve ... ]
Description
Most Stata commands allow the use of the by prefix; see [D] by. For
example, the syntax diagram for the regress command could be presented as
[by varlist:] regress ...
This entry discusses how to write programs (ado-files) so that the
program can be used with the by prefix.
Options
byable(recall[,noheader] | onecall) specifies that the program is to
allow the by prefix to be used with it and specifies the style in
which the program is coded.
There are two supported styles, known as byable(recall) and
byable(onecall). byable(recall) programs are usually -- not always
-- easier to write and byable(onecall) programs are usually -- not
always -- faster.
byable(recall) programs are executed repeatedly, once per by group.
byable(onecall) programs are executed only once and it is the
program's responsibility to handle the implications of the by prefix
if it is specified.
If you wrote program myprog in the byable(recall) style, then were
the user to type
. by pid: myprog ...
myprog would be executed repeatedly just as if the user had typed
. myprog ... if pid==1
. myprog ... if pid==2
. etc...
except that an if condition is not used to communicate to which
subsample myprog should restrict its calculations. Rather, the
sample is automatically restricted to the appropriate subsample when
myprog uses the mark or marksample commands; see [P] mark.
In addition, the following local macros are defined:
`_byindex' contains the name of a temporary variable
containing 1, 2, ... denoting the by-groups
`_byvars' contains the names of the actual by-variables
`_byrc0' contains ", rc0" if the user specified by ...: with
the rc0 option and contains nothing otherwise.
and the following functions are also available for use in
expressions:
_by() returns 1 if by ...: was specified, and 0
otherwise.
_byindex() returns 1, 2, ..., reflecting the by-group
currently being executed; returns 1 if _by()==0.
_bylastcall() returns 1 if this is the last by-group and 0
otherwise; returns 1 if _by()==0
_byn1() returns the beginning observation number of the
by-group currently being executed; returns 1 if
_by()==0. The value returned by _byn1() is valid
only if the data have not been re-sorted since the
original call to by varlist: stata_cmd.
_byn2() returns the ending observation number of the
by-group currently being executed; returns 1 if
_by()==0. The value returned by _byn2() is valid
only if the data have not been re-sorted since the
original call to by varlist: stata_cmd.
Thus, the by-group being executed can be obtained by restricting
calculations to the subsample `_byindex'==_byindex(), but that is not
how it is usually done. Instead, the program uses mark or marksample
because there may be other restrictions that apply as well and mark
and marksample will consider all of them.
byable(recall,noheader) programs are distinguished from
byable(recall) programs in that by will not display a by-group header
before each calling of the program.
byable(onecall) programs are required to handle the by ...: prefix
themselves, including displaying the header should they wish that.
See [P] byable for details.
sortpreserve specifies that the program, during its execution, will
re-sort the data and that therefore Stata itself should take action
to preserve the order of the data so that the order can be
reestablished afterward.
sortpreserve is in fact independent of whether a program is byable()
but byable() programs often specify this option.
Pretend you are writing the program myprog and that, in performing
its calculations, it needs to sort the data. It is very jolting for
a user to experience,
. by pid: myprog ...
. by pid: sum newvar
not sorted
r(5);
Specifying sortpreserve will prevent this and still allow myprog to
sort the data freely. byable() programs that sort the data should
specify sortpreserve. It is not necessary to specify sortpreserve if
your program does not change the sort order of the data and, in that
case, things are a little better if you do not specify sortpreserve.
sortpreserve takes time, although less than you might suspect.
sortpreserve does not actually have to re-sort the data at the
conclusion of your program -- an O(n ln n) operation -- it is able to
arrange things so that it can reassert the original order of the data
in O(n) time, and sortpreserve is, in fact, very quick about it.
Nonetheless, there is no reason to waste the time if the data never
got out of order.
Concerning sort order, when your byable() program is invoked for the
first time, it will be sorted on _byvars but, in subsequent calls (in
the case of byable(recall) programs), the sort order will be just as
your program leaves it even if you specify sortpreserve.
sortpreserve restores the original order after your program has been
called for the last time.
Example 1:
program myprog1, byable(recall)
syntax [varlist] [if] [in]
marksample touse
summarize `varlist' if `touse'
end
In the above program, it would be a mistake to code it
program myprog1, byable(recall)
syntax [varlist] [if] [in]
summarize `varlist' `if' `in'
end
because in that case, the sample would not be restricted to the
appropriate by-group when the user specified the by ...: prefix.
marksample, however, knows when a program is being by'd and so will set
the `touse' variable to reflect whatever restrictions the user specified
and the by-group restriction.
syntax, too, knows about by and it will automatically issue an error
message when the user specifies by ...: and an in range together even
though in range will be allowed when not combined with by.
Example 2:
program myprog2, byable(recall) sortpreserve
syntax varname [if] [in]
marksample touse
sort `touse' `varlist'
...
end
This program specifies sortpreserve because it changes the sort order of
the data in order to make its calculations.
Example 3:
program myprog3, byable(onecall) sortpreserve
syntax newvar =exp [if] [in]
marksample touse
tempvar rhs
quietly {
gen double `rhs' `exp' if `touse'
sort `touse' `_byvars' `rhs'
by `touse' `_byvars': gen `type' `varlist' = /*
*/ `rhs' - `rhs'[_n-1] if `touse'
}
end
This program specifies sortpreserve because it changes the sort order of
the data.
In addition, this program is byable(onecall) and, were we to change
byable(onecall) to byable(recall), we would break the program. This
program creates a new variable and a variable can only be generated once;
after that we would have to use replace.
Also see
Manual: [P] byable;
[P] sortpreserve
Help: [D] by, [P] program