Loglinear analysis of cross-classifications (STB-6: smv5) ------------------------------------------- ^loglin^ count varlist [^in^ range] [^if^ exp] [^=^exp]^, ^ ^fit(^margins to be fit^)^ [ ^ltol(^#^) iter(^#^) offset(^variable^)^ ^anova keep resid collapse^ ] Description ----------- ^loglin^ estimates a Poisson maximum-likelihood loglinear model. There are two cases: 1) You have only a summary table, and count indicates the number of cases that fall in each level of varlist, or 2) you have full information on all cases, so that each case should count once. If you fall into case #2, you would be better served to use the ^poisson^ command; see ^help poisson^. Description, continued ---------------------- For ^loglin^, the ^count^ variable should be a positive integer, a count of the number of cases which fall in the cross-classification of varlist. The counts must be non-negative for all combinations of the independent variables specified in varlist. If a count exactly equals zero, you have three choices: 1) you may assume that it is a structural zero and replace it with a missing value or a zero cell weight; 2) you may add a small positive constant, for example, .5, to zero cells; or, best of all, 3) you may get more data. Cell weights ------------ If you specify an expression ^= exp^, ^loglin^ will assume that the numbers represent cell weights. The default option for cell weights is no rescaling. If you wish to specify that a particular cell is a structural zero, an appro- priate method is to specify a cell weight of zero for that cell. Functional form --------------- This model falls in the class of generalized linear models with a categorical design matrix, a log link, and a poisson distributed disturbance. Thus, the program generates a design matrix similar to the ^anova^ command which is then passed to ^poilog^ (a command similar to the ^poisson^ command). The functional form of the model is log-linear: (predicted value) + (offset, if present) E(count) = e or ln E(count) = (predicted value) + (offset, if present) where the predicted value is a linear combination of the design matrix for the categorical independent variables in varlist. ^predict^ may not be used after ^loglin^; instead specify the ^resid^ option. If the offset is present, it is added onto the predicted value for the purposes of estimation, so that the pre- diction is actually a predicted rate. ^anova^ option and constraints -------------------------------- Like ^anova^, the design matrix for ^loglin^ is not identified, hence con- straints must be imposed on estimated parameters in order to generate a unique solution. There are two used in this command: Anova-like and regression-like. In regression-like constraints, redundant levels of independent variables are summarily dropped (the first level is dropped, then any interaction with it). In anova-like constraints, the first level is dropped, but the missing level is set equal to -1 times the sum of all the other levels. Interpret regres- sion-like parameter estimates as deviations from the baseline level, and in- terpret anova-like parameter estimates as deviations from the grand mean. To activate anova-like constraints, specify the ^anova^ option. Otherwise, regression-like constraints will be used. ^keep^ option ----------- Normally, the loglin program ^drop^s all the variables it generates for esti- mation. If you specify the ^keep^ option, these variables will remain in the data set for future use. Only the 1st-order variables (i.e., A1...m, B1...n, C1...o, etc.) will be labeled. Keeping the variables allows the user to create a new design matrix from the already existing variables. It does add substan- tially to the size of the data set, however. ^Keep^ does not work when ^collapse^ is specified. ^resid^ option ------------ If you specify the ^resid^ option, estimated expected cell frequencies, residuals and standardized residuals will be calculated, displayed, and stored in the variables ^_cellhat^, ^_resid^ and ^_stdres^. If ^collapse^ is also specifed, the above will be displayed but not kept in the dataset. ^collapse^ option --------------- Specify the ^collapse^ option ONLY if: 1) your data set contains more variables than you wish to work with in the specific model fit, AND 2) you wish to analyze the subset specified in ^varlist^ AS IF they were the complete table. The ^collapse^ option calculates cell counts for the variables in ^varlist^, adding together the counts from all other variables not in ^varlist^ and placing them in appropriate cells (i.e., it collapses the table). It then generates a temporary data set on which it performs analysis. After calcula- tions are completed, it restores the original data set. Note that if you specify both the ^resid^ and ^collapse^ option, your estimated expected cell frequencies, residuals, and standardized residuals will be displayed, but not saved with your original data set. ^fit(^margins to be fit^)^ ---------------------- To specify a loglinear model, the ^fit^ option must be specified. This program generates hierarchical models, so that only the highest interaction must be specified. All lower-level interactions will be automatically included. Separate the margins by commas, and specify interactions with a ^blank^. The fit notation follows that developed by S. Feinberg, 1981, ^The Analysis of^ ^Cross-classified Categorical Data^, Cambridge, MA: MIT Press. For example, suppose we have summary data with three independent variables, ^iv1^, ^iv2^, and ^iv3^, with counts coded in a variable called ^dv^. If we wish to fit an independence model, we type: ^loglin iv1 iv2 iv3 =dv, fit(iv1,iv2,iv3)^ If we wish to fit a saturated model, we type: ^loglin iv1 iv2 iv3 =dv, fit(iv1 iv2 iv3)^ An alternative model might be: ^loglin iv1 iv2 iv3 =dv, fit(iv1 iv2,iv2 iv3)^ Estimation ---------- ^loglin^ generates the appropriate design matrix and passes that matrix to the ^poilog^ command for estimation. ^poilog^, which is a minor modification of ^poisson^, uses iteratively reweighted least squares, the estimates of which are equivalent to maximum-likelihood. Convergence ----------- The parameters ^ltol()^ and ^iter()^ may be used to control the maximization process. ^ltol()^ specifies the maximum change in the log likelihood that will be accepted as indicating convergence (default 1e-7), and ^iter()^ specifies the maximum number of iterations (default 100). Author ------ D. J. Judson, Dept. of Sociology, Washington State University.