Search
>> Home >> Products >> Stata 11 >> Programming

What’s new in Stata programming

• The big news in programming concerns parsing varlists containing factor variables, dealing with factor variables, and processing matrices whose row or column names contain factor variables.

• syntax will allow varlists to contain factor variables if new specifier fv is among the specifiers in the description of the varlist, for instance,
          syntax varlist(fv) [if] [in] [, Detail]

Similarly, syntax will allow a varlist option to include factor variables if fv is included among its specifiers:
           syntax varlist(fv) [if] [in] [, Detail] EQ(varlist fv)

• You can use resulting macro ‘varlist’ as the varlist for any Stata command that allows factor varlists.
• Factor varlists come in two flavors, general and specific. An example of a general factor varlist is mpg i.foreign. The corresponding specific factor varlist might be
            mpg i(0 1)b0.foreign

A specific factor varlist is specific with respect to a given problem, which is to say, a given dataset and subsample. The specific varlist identifies the values taken on by factor variables and the base.

Users usually specify general factor varlists, although they can specify specific ones. In the process of your program, a factor varlist, if it is general, will become specific. This is usually automatic.

Existing commands _rmcoll and _rmdcoll now accept a general or specific factor varlist and return a specific varlist in r(varlist).

Existing command ml accepts a general or specific factor varlist and returns a specific varlist, in this case in the row and column names of the vectors and matrices it produces. The same applies to Mata’s new moptimize() function, which is equivalent to ml.

Similarly, all Stata estimation commands that allow factor varlists return the specific varlist in the row and column names of e(b) and e(V).

Factor varlist mpg i(0 1)b0.foreign is specific. The same varlist could be written mpg i0b.foreign i1.foreign, so that is specific, too. The first is specific and unexpanded. The second is specific and expanded. New command fvexpand takes a general or specific (expanded or unexpanded) factor varlist, if or in, and returns a fully expanded, specific varlist.

New command fvunab takes a general or specific factor varlist and returns it in the same form, but with variable names unabbreviated.
• Matrix row and column names are now generalized to include factor variables. The row or column names contain the elements from a fully expanded, specific factor varlist. Because a fully expanded, specific factor varlist is a factor varlist, the contents of the row or column names can be used with other Stata commands as a varlist. Unrelatedly, the equation portion of the row or column name now has a maximum length of 127 rather than the previous 32.
• The treatment of variables that are omitted because of collinearity has changed. Previously, such variables were dropped from e(b) and e(V) except by regress, which included the variables but set the corresponding element of e(b) to zero and similarly set the corresponding row and column of e(V) to zero. Now all Stata estimators that allow factor variables work like regress.

Also, if you want to know why the variable was dropped, you can look at the corresponding element of the row or column name. The syntax of an expanded, specific varlist allows operators o and b. Operator o indicates omitted either because the user specified omitted or because of collinearity; b indicates omitted due to being a base category. For instance, o.mpg would indicate that mpg was omitted, whereas i0b.foreign would indicate that foreign==0 was omitted because it was the base category. Either way, the corresponding element of e(b) will be zero, as will the corresponding rows and columns of e(V).

This new treatment of omitted variables—previously called dropped variables—can cause old user-written programs to break. This is especially true of old postestimation commands not designed to work with regress. If you set version to 10 or earlier before estimation, however, then estimation results will be stored in the old way and the old postestimation commands will work. The solution is
                   . version 10
. estimation_command ...
. old_postestimation_command ...
. version 11

When running under version 10 or earlier, you may not use factor variables with the estimation command.
• Because omitted variables are now part of estimation results, constraints play a larger role in the implementation of estimators. Omitted variables have coefficients constrained to be zero. ml now handles such constraints automatically and posts in e(k\_autoCns) the number of such constraints, which can be due to the variable being used as the base, being empty, or being omitted. makecns similarly saves in r(k_autoCns) the number of such constraints, and in r(clist), the constraints used. The matrix of constraints is now posted with ereturn post and saved, as usual, in e(Cns). ereturn matrix no longer posts constraints. Old behavior is preserved under version control.
• There are additional commands to assist in using and manipulating factor varlists that are documented only online; type help undocumented in Stata.
• Factor variables also allow interactions. Up to eight-way interactions are allowed.

• Consider the interaction a#b. If each took on two levels, the unexpanded, specific varlist would be i(1 2)b1.a#i(1 2)b1.b. The expanded, specific varlist would be 1b.a#1b.b 1b.a#2.b 2.a#1b.b 2.a#2.b.
• Consider the interaction c.x#c.x, where x is continuous. The unexpanded and expanded, specific varlists are the same as the general varlist: c.x#c.x.
• Consider the interaction a#c.x. The unexpanded, specific varlist is i(1 2).a#c.x, and the expanded, specific varlist is 1.a#c.x 2.a#c.x.
• All these varlists are handled in the same way that factor variables are handled, as outlined in item 1) above.
• New command fvrevar creates equivalent, temporary variables for any factor variables, interactions, or time-series–operated variables so that older commands can be easily converted to working with factor variables. We hasten to add that, in general, Stata does not follow the fvrevar approach. Think of this fvrevar as a generalization of tsrevar.
• Factor variables lead to a number of additions to what is saved in e() and sometimes r():

• Estimation commands that post e(V) now post the corresponding rank of the matrix in scalar e(rank).
• Estimation commands that allow constraints now post the constraints matrix in matrix e(Cns).
• In many estimation commands allowing constraints, and in the programming command makecns, scalar e(k_autoCns) is now posted containing the sum of the the number of base, empty, and omitted constraints.
• Programming command makecns now save the constraints used in macro r(rclist).
• Estimation commands that allow factor variables now post in macro e(asbalanced) the name of each factor variable participating in e(b) that was fvset design asbalanced and post in macro e(asobserved) the name of each factor variable participating in e(b) that was fvset design asobserved.
• Estimation commands now post in macros how new command margins is to treat their prediction statistics when the statistics require special treatment. These macros are e(marginsok), e(marginsnotok), and e(marginsprop).

e(marginsok) specifies the name of predictors that are to be allowed and that appear to violate margins’ usual rules, such as dependent variables being involved in the calculation.

e(marginsnotok) are statistics that margins fails to identify as violating assumptions but that do and should not be allowed.

e(emarginsprop) provides special signals as to how statistics for the estimator must be handled. Currently allowed are combinations of addcons, noeb, and nochainrule. addcons means that the estimated equations have no constant even if the user did not specify noconstant at estimation time. noeb means that the estimator does not store the covariate names on the name stripe of e(b). nochainrule means that the chain rule may not be used to calculate derivatives.
• Matrix e(V_modelbased), the model-based VCE, is now posted by most estimation commands that allow robust variance estimation by bootstrap and jackknife.
• Existing command sktest now returns in matrix r(N) the matrix of observation counts and in matrix r(Utest) the matrix of test results.
• Existing command estimates describe using now saves in scalar r(nestresults) the number of sets of estimation results saved in the .ster file.
• Existing command correlate saves in matrix r(C) the correlation or covariance matrix.
• Existing command ml has been rewritten. It is now implemented in terms of new Mata function and optimization engine moptimize(). The new ml handles automatic or implied constraints, posts some additional information to e(), and allows evaluators written in Mata as well as ado.
• Existing command estimates save now has option append, which allows storing more than one set of estimation results in the same file.
• Existing commands ereturn post and ereturn repost now work with more commands, including logit, mlogit, ologit, oprobit, probit, qreg, _qreg, regress, stcox, and tobit. Also, ereturn post and ereturn repost now allow weights to be specified and save them in e(wtype) and e(wexp).
• Existing command markout has new option sysmissok, which excludes observations with variables equal to system missing (.) but not to extended missing (.a, .b, ..., .z). This has to do with new emphasis on imputation of missing values.
• New commands varabbrev and unabbrev make it easy to temporarily reset whether Stata allows variable-name abbreviations.
• New programming function smallestdouble() returns the smallest double-precision number greater than zero.
• creturn has new returned values:

• c(noisily) returns 0 when output is being suppressed and 1 otherwise. Thus programmers can avoid executing code whose only purpose is to display output.
• c(smallestdouble) returns the smallest double-precision value that is greater than 0.
• c(tmpdir) returns the temporary directory being used by Stata.
• c(eqlen) returns the maximum length that Stata allows for equation names.
• Existing extended macro function :dir has new option respectcase, which causes :dir to respect uppercase and lowercase when performing filename matches. This option is relevant only for Windows.
• Stata has new string functions strtoname(), soundex(), and soundex_nara().
• Stata has 17 new numerical functions: sinh(), cosh(), asinh(), and acosh(); hypergeometric() and hypergeometricp(); nbinomial(), nbinomialp(), and nbinomialtail(); invnbinomial() and invnbinomialtail(); poisson(), poissonp(), and poissontail(); invpoisson() and invpoissontail(); and binomialp().
• Stata has nine new random-variate functions for beta, binomial, chi-squared, gamma, hypergeometric, negative binomial, normal, Poisson, and Student’s t: rbeta(), rbinomial(), rchi2(), rgamma(), rhypergeometric(), rnbinomial(), rnormal(), rpoisson(), and rt(), respectively. Also, old function uniform() is renamed runiform(). All random-variate functions start with r.
• Existing function clear has new syntax clear matrix, which clears (drops) all Stata matrices, as distinguished from clear mata, which drops all Mata matrices and functions.
• These days, commands intended for use by end-users are often being used as subroutines by other end-user commands. Some of these commands preserve the data simply so that, should something go wrong or the user press Break, the original data can be restored. Sometimes, when such commands are used as subroutines, the caller has already preserved the data. Therefore, all programmers are requested to include option nopreserve on commands that preserve the data for no other reason than error recovery, and thus speed execution when commands are used as subroutines.