Stata 15 help for sem_and_gsem path notation

[SEM] sem and gsem path notation -- Command syntax for path diagrams

Syntax

sem paths ... [, covariance() variance() means()]

gsem paths ... [, covariance() variance() means()]

paths specifies the direct paths between the variables of your model.

The model to be fit is fully described by paths, covariance(), variance(), and means().

Description

The command syntax for describing your SEM is fully specified by paths, covariance(), variance(), and means(). How this works is described below.

If you are using sem, also see [SEM] sem path notation extensions for documentation of the group() option for comparing different groups in the data. The syntax of the elements described below is modified when group() is specified.

If you are using gsem, also see [SEM] gsem path notation extensions for documentation on specification of family-and-link for generalized (nonlinear) response variables, specification of multilevel latent variables, specification of categorical latent variables, and specification of multiple-group models. The syntax of the elements described below is modified when the group() option for comparing different groups or the lclass() option for categorical latent variables is specified.

Either way, read this section first.

Options

covariance() is used to

1. specify that a particular covariance path of your model that usually is assumed to be 0 be estimated,

2. specify that a particular covariance path that usually is assumed to be nonzero is not to be estimated (to be constrained to be 0),

3. constrain a covariance path to a fixed value, such as 0, 0.5, 1, etc., and

4. constrain two or more covariance paths to be equal.

variance() does the same as covariance() except it does it with variances.

means() does the same as covariance() except it does it with means.

Remarks

Path notation is used by the sem and gsem commands to specify the model to be fit, for example,

. sem (x1 x2 x3 x4 <- X)

. gsem (L1 -> x1 x2 x3 x4 x5) (L2 -> x6 x7 x8 x9 x10)

In the path notation,

1. Latent variables are indicated by a name in which at least the first letter is capitalized.

2. Observed variables are indicated by a name in which at least the first letter is lowercased. Observed variables correspond to variable names in the dataset.

3. Error variables, while mathematically a special case of latent variables, are considered in a class by themselves. For sem, every endogenous variable (whether observed or latent) automatically has an error variable associated with it. For gsem, the same is true of Gaussian endogenous variables (and latent variables, which are Gaussian). The error variable associated with endogenous variable name is e.name.

4. Paths between variables are written as

(name1 <- name2)

or

(name2 -> name1)

There is no significance to which coding is used.

5. Paths between the same variables can be combined: The paths

(name1 <- name2) (name1 <- name3)

can be combined as

(name1 <- name2 name3)

or as

(name2 name3 -> name1)

The paths

(name1 <- name3) (name2 <- name3)

can be combined as

(name1 name2 <- name3)

Specifying variances and covariances

6. Variances and covariances (curved paths) between variables are indicated by options. Variances are indicated by

..., ... var(name1)

Covariances are indicated by

..., ... cov(name1*name2)

..., ... cov(name2*name1)

There is no significance to the order of the names.

The actual names of the options are variance() and covariance(), but they are invariably abbreviated as var() and cov(), respectively.

The var() and cov() options are the same option, so a variance can be typed as

..., ... cov(name1)

and a covariance can be typed as

..., ... var(name1*name2)

7. Variances may be combined, covariances may be combined, and variances and covariances may be combined.

If you have

..., ... var(name1) var(name2)

you may code this as

..., ... var(name1 name2)

If you have

..., ... cov(name1*name2) cov(name2*name3)

you may code this as

..., ... cov(name1*name2 name2*name3)

All the above combined can be coded as

..., ... var(name1 name2 name1*name2 name2*name3)

or as

..., ... cov(name1 name2 name1*name2 name2*name3)

8. All variables except endogenous variables are assumed to have a variance; it is only necessary to code the var() option if you wish to place a constraint on the variance or specify an initial value. See items 11, 12, 13, and 16 below. (In gsem, the variance and covariances of observed exogenous variables are not estimated and thus var() cannot be used with them.)

Endogenous variables have a variance, of course, but that is the variance implied by the model. If name is an endogenous variable, then var(name) is invalid. The error variance of the endogenous variable is var(e.name).

9. Variables mostly default to being correlated:

a. All exogenous variables are assumed to be correlated with each other, whether observed or latent.

b. Endogenous variables are never directly correlated, although their associated error variables can be.

c. All error variables are assumed to be uncorrelated with each other.

You can override these defaults on a variable-by-variable basis with the cov() option.

To assert that two variables are uncorrelated that otherwise would be assumed to be correlated, constrain the covariance to be 0:

..., ... cov(name1*name2@0)

To allow two variables to be correlated that otherwise would be assumed to be uncorrelated, simply specify the existence of the covariance:

..., ... cov(name1*name2)

This latter is especially commonly done with errors:

..., .. cov(e.name1*e.name2)

(In gsem, you may not use the cov() option with observed exogenous variables. You also may not use cov() with error terms associated with family Gaussian, link log.)

10. Means of variables are indicated by the following option:

..., ... means(name)

Variables mostly default to having nonzero means:

a. All observed exogenous variables are assumed to have nonzero means. In sem, the means can be constrained using the means() option, but only if you are performing noxconditional estimation; see [SEM] sem option noxconditional.

b. Latent exogenous variables are assumed to have mean 0. Means of latent variables are not estimated by default. If you specify enough normalization constraints to identify the mean of a latent exogenous variable, you can specify means(Name) to indicate that the mean should be estimated in either.

c. Endogenous variables have no separate mean. Their means are those implied by the model. The means() option may not be used with endogenous variables.

d. Error variables have mean 0 and this cannot be modified. The means() option may not be used with error variables.

To constrain the mean to a fixed value, such as 57, code

..., ... means(name@57)

Separate means() options may be combined:

..., ... means(name1@57 name2@100)

11. Fixed-value constraints may be specified for a path, variance, covariance, or mean by using @ (the "at" symbol). For example,

(name1 <- name2@1)

(name1 <- name2@1 name3@1)

..., ... var(name@100)

..., ... cov(name1*name2@223)

..., ... cov(name1@1 name2@1 name1*name2@.8)

..., ... means(name@57)

12. Symbolic constraints may be specified for a path, variance, covariance, or mean by using @ (the "at" symbol). For example,

(name1 <- name2@c1) (name3 <- name4@c1)

..., ... var(name1@c1 name2@c1) cov(name1@1 name2@1 name3@1 name1*name2@c2 name1*name3@c2)

..., ... means(name1@c1 name2@c1)

(name1 <- name2@c1) ..., var(name3@c1) means(name4@c1)

Symbolic names are just names from 1 to 32 characters in length. Symbolic constraints constrain equality. For simplicity, all constraints below will have names c1, c2, ...

13. Linear combinations of symbolic constraints may be specified for a path, variance, covariance, or mean by using @ (the "at" sign). For example,

(name1 <- name2@c1) (name3 <- name4@(2*c1))

..., ... var(name1@c1 name2@(c1/2))

..., ... cov(name1@1 name2@1 name3@1 name1*name2@c1 name1*name3@(c1/2))

..., ... means(name1@c1 name2@(3*c1+10))

(name1 <- name2@(c1/2)) ..., var(name3@c1) means(name4@(2*c1))

14. All equations in the model are assumed to have an intercept (to include observed exogenous variable _cons) unless the noconstant option (abbreviation nocons) is specified, and then all equations are assumed not to have an intercept (not to include _cons). (There are some exceptions to this in gsem because some generalized linear models have no intercept or even the concept of an intercept.)

Regardless of whether noconstant is specified, you may explicitly refer to observed exogenous variable _cons.

The following path specifications are ways of writing the same model:

(name1 <- name2) (name1 <- name3)

(name1 <- name2) (name1 <- name3) (name1 <- _cons)

(name1 <- name2 name3)

(name1 <- name2 name3 _cons)

There is no reason to explicitly specify _cons unless you have also specified the noconstant option and want to include _cons in some equations but not others, or regardless of whether you specified the noconstant option, you want to place a constraint on its path coefficient. For example,

(name1 <- name2 name3 _cons@c1) (name4 <- name5 _cons@c1)

15. The noconstant option may be specified globally or within a path specification. That is,

(name1 <- name2 name3) (name4 <- name5), nocon

suppresses the intercepts in both equations. Alternatively,

(name1 <- name2 name3, nocon) (name4 <- name5)

suppresses the intercept in the first equation but not the second, whereas

(name1 <- name2 name3) (name4 <- name5, nocon)

suppresses the intercept in the second equation but not the first.

In addition, consider the equation

(name1 <- name2 name3, nocons)

This can be written equivalently as

(name1 <- name2, nocons) (name1 <- name3, nocons)

16. Initial values (starting values) may be specified for a path, variance, covariance, or mean by using the init(#) suboption:

(name1 <- (name2, init(0)))

(name1 <- (name2, init(0)) name3)

(name1 <- (name2, init(0)) (name3, init(5)))

..., ... var((name3, init(1)))

..., ... cov((name4*name5, init(.5)))

..., ... means((name5, init(0)))

The initial values may be combined with symbolic constraints:

(name1 <- (name2@c1, init(0)))

(name1 <- (name2@c1, init(0)) name3)

(name1 <- (name2@c1, init(0)) (name3@c2, init(5)))

..., ... var((name3@c1, init(1)))

..., ... cov((name4*name5@c1, init(.5)))

..., ... means((name5@c1, init(0)))

Examples

These examples demonstrate path notation using the sem command, but sem could be replaced with gsem in each case. See sem path notation extensions and gsem path notation extensions for examples demonstrating unique features of path notation for each command.

Examples: Basic path notation

Setup . sysuse auto

A simple regression model . sem (mpg <- turn trunk length)

Same model as above . sem (mpg <- turn ) (mpg <- trunk) (mpg <- length)

Constrain constant to be zero . sem (mpg <- turn trunk length _cons@0)

Same as above, but with the noconstant option . sem (mpg <- turn trunk length), noconstant

Examples: Specifying the covariance() and variance() options

Fit a recursive structural model . sem (mpg <- turn trunk price) (trunk <- length)

Estimate the covariance between the errors of mpg and trunk . sem (mpg <- turn trunk price) (trunk <- length), covariance(e.mpg*e.trunk)

Constrain the error variance of mpg to be 10 . sem (mpg <- turn trunk length) (trunk <- price), variance(e.mpg@10)

Examples: Specifying the means() option

Setup . webuse sem_1fmm

A one-factor measurement model . sem (X -> x1 x2 x3 x4)

Constrain the mean of X to be 5 . sem (X -> x1 x2 x3 x4), means(X@5)


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index