»  Home »  Products »  Features »  SEM

## Structural equation modeling (SEM)

What is SEM?

View the complete list of SEM features

SEM stands for structural equation modeling. SEM is a notation for specifying structural equations, a way of thinking about them, and methods for estimating their parameters.

SEM encompasses a broad array of models from linear regression to measurement models to simultaneous equations, including along the way confirmatory factor analysis (CFA), correlated uniqueness models, latent growth models, and multiple indicators and multiple causes (MIMIC).

Stata’s sem fits linear SEMs, and its features are described below. gsem provides extensions to linear SEMs that allow for generalized-linear models and multilevel models.

### Features

• Use GUI or command language to specify model.
• Standardized and unstandardized results.
• Direct and indirect effects.
• Goodness-of-fit statistics.
• Tests for omitted paths and tests of model simplification including modification indices, score tests, and Wald tests.
• Predicted values and factor scores.
• Linear and nonlinear (1) tests of estimated parameters and (2) combinations of estimated parameters with CIs.
• Estimation across groups is as easy as adding group(sex) to the command. Test for group invariance. Easily add or relax constraints across groups.
• SEMs may be fitted using raw or summary statistics data.
• Maximum likelihood (ML) and asymptotic distribution free (ADF) estimation. ADF is also known as generalized method of moments (GMM). Missing at random (MAR) data supported via FIML.
• Robust estimate of standard errors and standard errors for clustered samples available.
• Support for survey data including sampling weights, stratification and poststratification, and clustered sampling at one or more levels.

### SEM Builder or commands, it’s your choice

or use the command syntax

. sem (L1 -> m1 m2)
(L2 -> m3 m4)
(L3 <- L1 L2)
(L3 -> m5 m6 m7)


It’s the same model either way.

Stata’s SEM Builder uses standard path notation.

In command syntax, you type the path diagram. Capitalized names are latent variables. Lowercased names are observed variables. You can type arrows in either direction. The above model could be equally well typed as

. sem (m1 m2 <- L1)
(L2 -> m3 m4)
(L3 <- L1 L2)
(L3 -> m5 m6 m7)


and order does not matter, and neither does spacing:

. sem (m1 m2 <- L1) (L2 -> m3 m4) (L3 -> m5 m6 m7) (L3 <- L1 L2)


You can specify paths individually,

. sem (m1 <- L1) (m2 <- L1) (L2 -> m3) (L2 -> m4)  (L3 -> m5) (L3 -> m6) (L3 -> m7) (L3 <- L1) (L3 <- L2)


or combined,

. sem (m1 m2 <- L1) (L2 -> m3 m4) (L3 -> m5 m6 m7) (L3 <- L1 L2)


### Show me

Let’s fit a structural model with a measurement component using data from Wheaton, Muthén, Alwin, and Summers (1977):

. webuse sem_sm2, clear
(Structural model with measurement component)

. ssd describe

Summary statistics data from https://www.stata-press.com/data/r17/sem_sm2.dta
Observations:             932          Structural model with measurem..
Variables:              13          25 May 2020 11:45
(_dta has notes)

Variable name                  Variable label

educ66                         Education, 1966
occstat66                      Occupational status, 1966
anomia66                       Anomia, 1966
pwless66                       Powerlessness, 1966
socdist66                      Latin American social distance, 1966
occstat67                      Occupational status, 1967
anomia67                       Anomia, 1967
pwless67                       Powerlessness, 1967
socdist67                      Latin American social distance, 1967
occstat71                      Occupational status, 1971
anomia71                       Anomia, 1971
pwless71                       Powerlessness, 1971
socdist71                      Latin American social distance, 1971

. notes

_dta:
1.  Summary statistics data from Wheaton, B., Muthen B., Alwin, D., & Summers,
G., 1977, "Assessing reliability and stability in panel models", in D. R.
Heise (Ed.), _Sociological Methodology 1977_ (pp. 84-136), San Francisco:
Jossey-Bass, Inc.
2.  Four indicators each measured in 1966, 1967, and 1981, plus another indicator
(educ66) measured only in 1966.
3.  Intended use: Create structural model relating Alienation in 1971, Alienation
in 1967, and SES in 1966.


Below we will demonstrate

### Show me, fitting the model

Simplified versions of the model fit by the authors of the referenced paper appear in many SEM software manuals. One simplified model is

You can also readily fit this model using the following command:

. sem                     ///
>         (anomia67 pwless67 <- Alien67)          /// measurement piece
>         (anomia71 pwless71 <- Alien71)          /// measurement piece
>         (Alien67 <- SES)                        /// structural piece
>         (Alien71 <- Alien67 SES)                /// structural piece
>         (    SES -> educ occstat66), nolog      //  measurement piece


And the results are

Endogenous variables
Measurement: anomia67 pwless67 anomia71 pwless71 educ66 occstat66
Latent:      Alien67 Alien71

Exogenous variables
Latent: SES

Structural equation model                                  Number of obs = 932
Estimation method: ml

Log likelihood = -15246.469

( 1)  [anomia67]Alien67 = 1
( 2)  [anomia71]Alien71 = 1
( 3)  [educ66]SES = 1

OIM
Coefficient  std. err.      z    P>|z|     [95% conf. interval]

Structural
Alien67
SES    -.6140404   .0562407   -10.92   0.000    -.7242701   -.5038107

Alien71
Alien67     .7046342   .0533512    13.21   0.000     .6000678    .8092007
SES    -.1744153   .0542489    -3.22   0.001    -.2807413   -.0680894

Measurement
anomia67
Alien67            1  (constrained)
_cons        13.61   .1126205   120.85   0.000     13.38927    13.83073

pwless67
Alien67     .8884887   .0431565    20.59   0.000     .8039034    .9730739
_cons        14.67   .1001798   146.44   0.000     14.47365    14.86635

anomia71
Alien71            1  (constrained)
_cons        14.13   .1158943   121.92   0.000     13.90285    14.35715

pwless71
Alien71     .8486022   .0415205    20.44   0.000     .7672235    .9299808
_cons         14.9   .1034537   144.03   0.000     14.69723    15.10277

educ66
SES            1  (constrained)
_cons         10.9   .1014894   107.40   0.000     10.70108    11.09892

occstat66
SES     5.331259   .4307503    12.38   0.000     4.487004    6.175514
_cons        37.49   .6947112    53.96   0.000     36.12839    38.85161

var(e.anomia67)    4.009921   .3582978                      3.365724    4.777416
var(e.pwless67)    3.187468    .283374                      2.677762    3.794197
var(e.anomia71)    3.695593   .3911512                      3.003245     4.54755
var(e.pwless71)    3.621531   .3037908                      3.072483    4.268693
var(e.educ66)    2.943819   .5002527                      2.109908    4.107319
var(e.occstat66)      260.63   18.24572                      227.2139    298.9605
var(e.Alien67)    5.301416    .483144                      4.434225    6.338201
var(e.Alien71)    3.737286   .3881546                      3.048951    4.581019
var(SES)     6.65587   .6409484                      5.511067    8.038482
LR test of model vs. saturated: chi2(6) = 71.62              Prob > chi2 = 0.0000

Notes:

1. Measurement component: In both 1967 and 1971 anomia and powerlessness are used to measure endogenous latent variables representing Alienation for the same two years. Education and occupational status are used to measure the exogenous latent variable SES.
2. Structural component: SES->Alien67 and SES->Alien71, and Alien67->Alien71.
3. The model vs. saturated chi-squared test indicates the model is a poor fit.

### Show me, modification indices

That the model is a poor fit leads us to looking at the modification indices:

. estat mindices

Modification indices

Standard

MI     df   P>MI        EPC        EPC

Measurement

anomia67 <-

anomia71       51.977      1   0.00   .3906425   .4019984

pwless71       32.517      1   0.00  -.2969297  -.2727609

educ66        5.627      1   0.02   .0935048   .0842631

pwless67 <-

anomia71       41.618      1   0.00  -.3106995  -.3594367

pwless71       23.622      1   0.00   .2249714   .2323233

educ66        6.441      1   0.01  -.0889042  -.0900664

anomia71 <-

anomia67       58.768      1   0.00    .429437   .4173061

pwless67       38.142      1   0.00  -.3873066  -.3347904

pwless71 <-

anomia67       46.188      1   0.00  -.3308484  -.3601641

pwless67       27.760      1   0.00   .2871709   .2780833

educ66 <-

anomia67        4.415      1   0.04   .1055965   .1171781

pwless67        6.816      1   0.01  -.1469371  -.1450411

cov(e.anomia67,e.anomia71)      63.786      1   0.00   1.951578   .5069627

cov(e.anomia67,e.pwless71)      49.892      1   0.00  -1.506704  -.3953794

cov(e.anomia67,e.educ66)       6.063      1   0.01   .5527612   .1608845

cov(e.pwless67,e.anomia71)      49.876      1   0.00  -1.534199  -.4470094

cov(e.pwless67,e.pwless71)      37.357      1   0.00   1.159123    .341162

cov(e.pwless67,e.educ66)       7.752      1   0.01  -.5557802  -.1814365

EPC = expected parameter change


Notes:

1. There are lots of statistically significant paths we could add to the model.
2. Some of those statistically significant paths also make theoretical sense.
3. Two in particular that make sense are the covariances between e.anomia67 and e.anomia71, and between e.pwless67 and e.pwless71.

### Show me, refitting the model

Let’s refit the model and include those two previously excluded covariances:

. sem                     ///
>         (anomia67 pwless67 <- Alien67)          /// measurement piece
>         (anomia71 pwless71 <- Alien71)          /// measurement piece
>         (Alien67 <- SES)                        /// structural piece
>         (Alien71 <- Alien67 SES)                /// structural piece
>         (    SES -> educ occstat66)             /// measurement piece
>                 , cov(e.anomia67*e.anomia71)    ///
>                   cov(e.pwless67*e.pwless71) nolog


And the results are

Endogenous variables

Measurement:  anomia67 pwless67 anomia71 pwless71 educ66 occstat66
Latent:       Alien67 Alien71

Exogenous variables

Latent:       SES

Structural equation model                       Number of obs      =       932
Estimation method  = ml
Log likelihood     = -15213.046

( 1)  [anomia67]Alien67 = 1
( 2)  [anomia71]Alien71 = 1
( 3)  [educ66]SES = 1

OIM

Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

Structural

Alien67 <-

SES    -.5752228    .057961    -9.92   0.000    -.6888244   -.4616213

Alien71 <-

Alien67      .606954   .0512305    11.85   0.000     .5065439     .707364

SES    -.2270301   .0530773    -4.28   0.000    -.3310596   -.1230006

Measurement

anomia67 <-

Alien67            1  (constrained)

_cons        13.61   .1126143   120.85   0.000     13.38928    13.83072

pwless67 <-

Alien67     .9785952   .0619825    15.79   0.000     .8571117    1.100079

_cons        14.67   .1001814   146.43   0.000     14.47365    14.86635

anomia71 <-

Alien71            1  (constrained)

_cons        14.13   .1159036   121.91   0.000     13.90283    14.35717

pwless71 <-

Alien71     .9217508   .0597225    15.43   0.000     .8046968    1.038805

_cons         14.9   .1034517   144.03   0.000     14.69724    15.10276

educ66 <-

SES            1  (constrained)

_cons         10.9   .1014894   107.40   0.000     10.70108    11.09892

occst~66 <-

SES      5.22132    .425595    12.27   0.000     4.387169    6.055471

_cons        37.49   .6947112    53.96   0.000     36.12839    38.85161

var(e.anom~67)    4.728874    .456299                      3.914024    5.713365

var(e.pwle~67)    2.563413   .4060733                      1.879225      3.4967

var(e.anom~71)    4.396081   .5171156                      3.490904    5.535966

var(e.pwle~71)    3.072085   .4360333                      2.326049    4.057398

var(e.educ66)    2.803674   .5115854                      1.960691    4.009091

var(e.occs~66)    264.5311   18.22483                      231.1177    302.7751

var(e.Alien67)    4.842059   .4622537                      4.015771    5.838364

var(e.Alien71)    4.084249   .4038995                      3.364613    4.957802

var(SES)    6.796014   .6524866                      5.630283    8.203105

cov(e.anom~67,

e.anomia71)    1.622024   .3154267     5.14   0.000     1.003799    2.240249

cov(e.pwle~67,

e.pwless71)    .3399961   .2627541     1.29   0.196    -.1749925    .8549847

LR test of model vs. saturated: chi2(4)   =      4.78, Prob > chi2 = 0.3111


Notes:

1. We find the covariance between e.anomia67 and e.anomia71 to be significant(Z=5.14).
2. We find the covariance between e.pwless67 and e.pwless71 to be insignificant at the 5% level (Z=1.29).

View a complete list of Stata’s features.