.- help for ^desmat^, ^desrep^, ^destest^ (STB-54: dm73.1) .- Generate design matrix for categorical and/or continuous variables ------------------------------------------------------------------- ^desmat^ model [, default_parameterization ] ^desrep^ [exp] ^destest^ [termlist] [^,^ ^j^oint ^e^qual] Description ------------ ^desmat^ is used to generate a design matrix, i.e. a set of dummy variables based on categorical and/or continuous variables. These dummy variables _x_* can then be used in any appropriate Stata procedure. ^desmat^ therefore serves the same purpose as ^xi^, but allows different types of parameterizations than the indicator contrast (i.e. dummy variables with a fixed reference category). In addition, ^desmat^ allows the specification of higher order interaction effects and an easier specification of the reference category. After estimating a model, ^desrep^ can be used to produce a compact overview of the estimates with informative labels. In addition, the program ^destest^ can be used to perform a Wald test on all terms in the model. ^desrep^ is a companion program for ^desmat^. It is used after estimating a model to produce a compact summary of the results with longer labels for the effects. Only the estimates and their standard deviations are reported, together with one asterisk "*" to indicate significance at .05 and two asterisks "**" to indicate significance at .01. ^destest^ is a companion program for ^desmat^. It is used after estimating a model to perform a Wald test on model terms. Remarks ------- The model consists of one or more terms separated by spaces. A term can be a single variable, two or more variables joined by period(s) ".", or two or more variables joined by asterisk(s) "*". A period is used to specify an interaction effect as such, whereas an asterisk indicates hierarchical notation, in which both the interaction effect itself plus all possible nested interactions and main effects are included. For example, the term "vote*educ*race" is expanded to "vote educ vote.educ race vote.race educ.race vote.educ.race". The model specification may be optionally followed by a comma and a default type of parameterization. A restriction of some type is required for the effects of categorical variables to be identifiable. The restriction used does not affect the fit of the model but does determine the meaning of the parameters. A common restriction and the one used by xi is to drop the dummy variable for a reference category. The parameters for that variable are then relative to the reference category. Another common constraint is the deviation contrast, in which parameters have a sum of zero. One parameter can therefore be dropped as redundant during estimation and found afterwards using minus the sum of the estimated parameters, or by re-estimating the model using a different omitted category. A parameterization can be specified as a name, of which the first three characters are significant, optionally followed by a specification of the reference category in parentheses (no spaces). The reference category should refer to the category number, not the category value. So for a variable with values 0 to 3, the parameterization "dev(1)" indicates that the deviation contrast is to be used with the first category (i.e. 0) as the reference.If no reference category is specified or the category specified is less than 1 then the first category is used as reference category. If the reference category specified is larger than the number of categories then the highest category is used. Note that for certain types of parameterizations, the "reference" specifiation has a different meaning. The available parameterization types are: ^ind^(^ref^) Indicator contrast, i.e. dummy variables with ^ref^ as reference category. This is the parameterization used by ^xi^ and the default parameterization for ^desmat^. ^dev^(^ref^) Deviation contrast. Parameters sum to zero over the categories of the variable. The parameter for ^ref^ is omitted as redundant, but can be found from minus the sum of the estimated parameters. ^sim^(^ref^) Simple contrast with ^ref^ as reference category. The highest order effects are the same as indicator contrast effects, but lower order effects and the constant will be different. ^dif^(^ref^) Difference contrast, for ordered categories. Parameters are relative to the next category. If the first letter of ^ref^ is "b" then the backward difference contrast is used instead, and parameters are relative to the previous category. ^hel^(^ref^) Helmert contrast, for ordered categories. Estimates represents the contrast between that category and the mean of the subsequent categories. If the first letter of ^ref^ is "b" then the reverse helmert contrast is used instead, and parameters are relative to the mean of the preceding categoriees. ^orp^(^ref^) Orthogonal polynomials of degree ^ref^. The first category is a linear effect, the second quadratic, etc. This option calls ^orthpoly^ to generate the design (sub)matrix. ^use^(^ref^) A user-defined contrast. ^ref^ refers to a contrast matrix with the same number of colums as the variable has categories, and at least one less rows. If rownames are specified for this matrix, these names will be used as variable labels for the resulting dummy variables. [Single lowercase letters as names for the contrast matrix cause problems at the moment, e.g "use(c)". Use uppercase names or more than one letter, e.g. "use(cc)" or "use(C)"] ^dir^ A direct effect, i.e. used to include continuous variables in the model. ^Parameterizations per variable^ Besides specifying a default parameterization after specification of the model, it is also possible to specify a specific parameterization for certain variables. This is done by appending "=par[(ref)]" to a single variable, "=par[(ref)].par[(ref)]" to an interaction effect, "=par[(ref)]*par[(ref)]" to an interaction using hierarchical notation. Alternatively, a ^pzat^ characteristic can be assigned to a variable to specifify the parameterization that should be used for that variable. A variable's ^pzat^ characteristic overrides the default parameterization specified. However, a specification on a term by term basis has precedence. Useful applications of the parameterization per variable feature could be to specify that some of the variables are continous while the rest are categorical, specifying a different reference category for certain variables, specifying a variable for the effects of time as a low order polynomial, etc. After generating a design matrix with ^desmat^, the dummy variables can be included in any Stata procedure as "_x_*". After estimating the model, the companion program ^desrep^ can be used to produce a compact summary of the results with descriptive labels. ^desmat^ creates global macro variables "$term1", "$term2", etc. for each terms in the model. These global variables can be used to perform significance tests using ^testparm^ or related programs. Optionally, as second program ^destest^ can be used to perform a Wald test on all model terms. Remarks for ^desrep^ --------------------- ^desmat^ adds the characteristics [varn] and [valn] to each variable name, corresponding with the name of the term and the value of the category respectively. If "valn" is defined for a variable, this value will printed with 2 spaces indentation. If not, the variable label, or the variable name if no label is present, is printed with no indentation. ^desrep^ can be used after any procedure that produces "e(b)" and "e(V)" and does not depend on the prior use of ^desmat^. ^desrep^ is usually used without any arguments. If "e(b)" and "e(V)" are present it will produce a summary of the results. If the argument for ^desrep^ is "exp" it will produce multiplicative parameters, e.g. incident rate ratios in poisson regression, oddsratios in logistic regression. The parameters are transformed into exp(b) and their standard errors into exp(b)*se, where "b" is the linear estimate and "se" its standard error. Note that if "exp" is not specified, ^desrep^ will produce the linear estimates even if the procedure gives multiplicative versions, since the procedure stores the linear estimates and covariance matrix in "e(b)" and "e(V)". Remarks for ^destest^ -------------------- The ^termlist^ consists of one or more terms as specified in ^desmat^. A ^term^ can consist of a single variable, or two or more variables separated by either asterisks or periods. If asterisks are used, they will be changed into periods by ^destest^, i.e. only the highest order interaction will be tested. If no arguments are specified, all terms from the last ^desmat^ model will be tested. The default is to test whether the effects of each separate term are equal to zero. If the option "^joint^" is specified, ^destest^ will test instead whether all the effects in all the terms are jointly equal to zero. If the option "^equal^" is specified, ^destest^ will test whether the effects of each separate term are equal. The "^joint^" and "^equal^" options may be combined to test whether all effects are jointly equal, although this would be a somewhat peculiar hypothesis. Only the first letter of the "^joint^" and "^equal^" option is sigificant. ^desmat^ creates global macro variables "$term1", "$term2", etc. containing a varlist for each term in the model. ^destest^ runs through these terms, finds the terms corresponding with the termlist, and runs ^termpar^ with the varlist. If these global variables have not been defined, ^destest^ will do nothing. These global variables can of course also be used separately in ^testparm^, ^sw^, or related programs. Author ------- John Hendrickx Nijmegen Business School University of Nijmegen The Netherlands j.hendrickx@@mailbox.kun.nl Also see --------- STB: STB-54: dm73.1, STB-52: dm73