.- help for ^byvar^ (STB-27: ip9; STB-55: ip9.1) .- Repeat command by variable(s) ----------------------------- ^byvar^ varlist [^,^ options ] ^:^ stata_cmd where options are ^e(^elist^)^ ^r(^rlist^)^ ^b(^coeflist^)^ ^se(^selist^)^ ^ge^nerate ^re^turn ^t^abulate ^m^issing ^p^ause ^nol^abel ^u^nique Description ----------- ^byvar^ repeats stata_cmd for each distinct combination of values in varlist. varlist may contain string variables. For details of storage of results, see the ^generate^ and ^return^ options. Options ------- ^e(^elist^)^ saves the E-class estimates ^e()^ named in elist which arise from stata_cmd. The estimates must evaluate to numbers; strings are not allowed. The estimate names must be separated by space(s). You may append a label, preceded by an ^=^ sign, to each estimate name; this will be used to label the corresponding column of output (if the ^tabulate^ option is used) or variable (if the ^generate^ option is used). The label will be truncated to 14 characters if it is longer than 14. If spaces are to be included, the label must be enclosed within quotes (^""^). Commas, colons or equals signs are not allowed anywhere within in the label. Example: ^e(rmse="RMS error" F="F statistic" N)^. ^r(^rlist^)^ saves the R-class results ^r()^ named in rlist which arise from stata_cmd. The estimates must evaluate to numbers; strings are not allowed. Individual items may be labelled as with the ^e()^ option. Example: ^r(W="W statistic" p=P-value)^. ^b(^coeflist^)^ stores the regression coefficients for variables named in coeflist. Individual items may be labelled as with the ^e()^ option. ^se(^selist^)^ stores the standard errors of regression coefficients for variables named in selist. Individual items may be labelled as with the ^e()^ option. ^generate^ creates new variable(s) corresponding to the quantities named in the ^e()^, ^r()^, ^b()^ and ^se()^ options. The names of the new variables begin with letter ^E^, ^R^, ^B^ and ^S^, respectively, followed by up to six characters which represent the ^e()^, ^r()^, ^b()^ and ^se()^ quantity or variable name. The final character is ^_^ (or sometimes, to avoid duplication, a letter). For example, ^e(rmse N) generate^ would create variables called ^Ermse_^ and ^EN_^, containing the values of ^e(rmse)^ and ^e(N)^, respectively, as left behind by each execution of stata_cmd. Results are stored according to the combinations of values of the by-variables in varlist. ^return^ returns the quantities named in the ^e()^, ^r()^, ^b()^ and ^se()^ options in functions of the form ^r(E^|^R^|^B^|^S^#1^gp^#2^)^. Here, #1 indexes the items in the ^e()^, ^r()^, ^b()^ and ^se()^ options; ^gp^#2 indexes the subgroups defined by the combinations of values in varlist. For example, ^e(rmse N) return^ would return ^r(E1gp1)^, ^r(E1gp2)^, ... containing ^e(rmse)^ for subgroups 1, 2, ... and ^r(E2gp1)^, ^r(E2gp2)^, ... containing ^e(N)^ for subgroups 1, 2, ... . ^tabulate^ displays the results in tabular form, suppressing the output (if any) from stata_cmd. ^missing^ causes stata_cmd to be executed even when a combination of values of any of the variables in varlist involves a missing value. The idea is the same as for the ^missing^ option in Stata's @tabulate@ command. ^pause^ pauses output after each execution of stata_cmd. Useful for graphs. ^nolabel^ suppresses display of score labels for categoric variables for which score labels are defined. Numeric values are used instead. ^unique^ is relevant only with ^generate^. It specifies that results for each unique combination of values defined by varlist are stored only in the first position in the new variable(s). Values in subsequent positions are set to missing. Remarks ------- In programming ^byvar^, I have attempted to solve an awkward problem: how to incorporate an ^if^ phrase, if one is specified in stata_cmd, when filtering stata_cmd according to values in varlist. I have done so by searching for " if " in the part of stata_cmd which precedes the first comma if one is present, or in the whole of stata_cmd if not. There may be types of stata_cmd for which this will not work correctly, but so far none have been encountered. Note that ^byvar^ acts conservatively when creating new variables with the ^generate^ option. It won't wipe out existing variables. You may therefore find your workspace becomes cluttered by variables beginning with the letters ^E^, ^R^, ^B^ or ^S^. With caution, you can type, for example, ^drop E* R*^ to eliminate them in one go. Examples -------- To produce a Normal Q-Q plot of ^weight^ for each non-missing value of ^rep78^: . ^use auto^ . ^byvar rep78, pause: qnorm weight^ To carry out Shapiro-Wilk tests on ^mpg^ for each of the 6 values of ^rep78^ including missing, store the W-statistics (^r(W)^) in functions ^r(R1gp1)^,..., ^r(R1gp6)^ and their P-values (^r(p)^) in functions ^r(R2gp1)^, ..., ^r(R2gp6)^, and display the results in tabular form, with columns headed ^W statistic^ and ^P-value^: . ^byvar rep78, r(W="W statistic" p=P-value) return tabulate missing: swilk mpg^ To create two new variables: ^Ermse_^ containing ^e(rmse)^, i.e. the regression mean square error, for each of the two values of ^foreign^, and ^Bweight_^ containing the estimated regression coefficients for regressing ^mpg^ on ^weight^: . ^byvar foreign, e(rmse) b(weight) generate: regress mpg weight^ Author ------ Patrick Royston, Imperial College School of Medicine, London p.royston@@ic.ac.uk Also see -------- Manual: [R] ^by^