Stata 11 help for byable

help byprog, help byable -------------------------------------------------------------------------------

Title

[P] byable -- Make programs byable

Syntax

program [define] program_name [, ... byable(recall[,noheader] | onecall) sortpreserve ... ]

Description

Most Stata commands allow the use of the by prefix; see [D] by. For example, the syntax diagram for the regress command could be presented as

[by varlist:] regress ...

This entry discusses how to write programs (ado-files) so that the program can be used with the by prefix.

Options

byable(recall[,noheader] | onecall) specifies that the program is to allow the by prefix to be used with it and specifies the style in which the program is coded.

There are two supported styles, known as byable(recall) and byable(onecall). byable(recall) programs are usually -- not always -- easier to write and byable(onecall) programs are usually -- not always -- faster.

byable(recall) programs are executed repeatedly, once per by group. byable(onecall) programs are executed only once and it is the program's responsibility to handle the implications of the by prefix if it is specified.

If you wrote program myprog in the byable(recall) style, then were the user to type

. by pid: myprog ...

myprog would be executed repeatedly just as if the user had typed

. myprog ... if pid==1 . myprog ... if pid==2 . etc...

except that an if condition is not used to communicate to which subsample myprog should restrict its calculations. Rather, the sample is automatically restricted to the appropriate subsample when myprog uses the mark or marksample commands; see [P] mark.

In addition, the following local macros are defined:

`_byindex' contains the name of a temporary variable containing 1, 2, ... denoting the by-groups

`_byvars' contains the names of the actual by-variables

`_byrc0' contains ", rc0" if the user specified by ...: with the rc0 option and contains nothing otherwise.

and the following functions are also available for use in expressions:

_by() returns 1 if by ...: was specified, and 0 otherwise.

_byindex() returns 1, 2, ..., reflecting the by-group currently being executed; returns 1 if _by()==0.

_bylastcall() returns 1 if this is the last by-group and 0 otherwise; returns 1 if _by()==0

_byn1() returns the beginning observation number of the by-group currently being executed; returns 1 if _by()==0. The value returned by _byn1() is valid only if the data have not been re-sorted since the original call to by varlist: stata_cmd.

_byn2() returns the ending observation number of the by-group currently being executed; returns 1 if _by()==0. The value returned by _byn2() is valid only if the data have not been re-sorted since the original call to by varlist: stata_cmd.

Thus, the by-group being executed can be obtained by restricting calculations to the subsample `_byindex'==_byindex(), but that is not how it is usually done. Instead, the program uses mark or marksample because there may be other restrictions that apply as well and mark and marksample will consider all of them.

byable(recall,noheader) programs are distinguished from byable(recall) programs in that by will not display a by-group header before each calling of the program.

byable(onecall) programs are required to handle the by ...: prefix themselves, including displaying the header should they wish that. See [P] byable for details.

sortpreserve specifies that the program, during its execution, will re-sort the data and that therefore Stata itself should take action to preserve the order of the data so that the order can be reestablished afterward.

sortpreserve is in fact independent of whether a program is byable() but byable() programs often specify this option.

Pretend you are writing the program myprog and that, in performing its calculations, it needs to sort the data. It is very jolting for a user to experience,

. by pid: myprog ...

. by pid: sum newvar not sorted r(5);

Specifying sortpreserve will prevent this and still allow myprog to sort the data freely. byable() programs that sort the data should specify sortpreserve. It is not necessary to specify sortpreserve if your program does not change the sort order of the data and, in that case, things are a little better if you do not specify sortpreserve.

sortpreserve takes time, although less than you might suspect. sortpreserve does not actually have to re-sort the data at the conclusion of your program -- an O(n ln n) operation -- it is able to arrange things so that it can reassert the original order of the data in O(n) time, and sortpreserve is, in fact, very quick about it. Nonetheless, there is no reason to waste the time if the data never got out of order.

Concerning sort order, when your byable() program is invoked for the first time, it will be sorted on _byvars but, in subsequent calls (in the case of byable(recall) programs), the sort order will be just as your program leaves it even if you specify sortpreserve. sortpreserve restores the original order after your program has been called for the last time.

Example 1:

program myprog1, byable(recall) syntax [varlist] [if] [in] marksample touse summarize `varlist' if `touse' end

In the above program, it would be a mistake to code it

program myprog1, byable(recall) syntax [varlist] [if] [in] summarize `varlist' `if' `in' end

because in that case, the sample would not be restricted to the appropriate by-group when the user specified the by ...: prefix. marksample, however, knows when a program is being by'd and so will set the `touse' variable to reflect whatever restrictions the user specified and the by-group restriction.

syntax, too, knows about by and it will automatically issue an error message when the user specifies by ...: and an in range together even though in range will be allowed when not combined with by.

Example 2:

program myprog2, byable(recall) sortpreserve syntax varname [if] [in] marksample touse sort `touse' `varlist' ... end

This program specifies sortpreserve because it changes the sort order of the data in order to make its calculations.

Example 3:

program myprog3, byable(onecall) sortpreserve syntax newvar =exp [if] [in] marksample touse tempvar rhs quietly { gen double `rhs' `exp' if `touse' sort `touse' `_byvars' `rhs' by `touse' `_byvars': gen `type' `varlist' = /* */ `rhs' - `rhs'[_n-1] if `touse' } end

This program specifies sortpreserve because it changes the sort order of the data.

In addition, this program is byable(onecall) and, were we to change byable(onecall) to byable(recall), we would break the program. This program creates a new variable and a variable can only be generated once; after that we would have to use replace.

Also see

Manual: [P] byable; [P] sortpreserve

Help: [D] by, [P] program


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index