Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Implementation of Latent Variable Model with SEM Builder

From   Stas Kolenikov <>
Subject   Re: st: Implementation of Latent Variable Model with SEM Builder
Date   Sat, 14 Apr 2012 11:54:15 -0500

On Sat, Apr 14, 2012 at 12:55 AM, Samantha Molbach
<> wrote:
> I need some help in implementing a Structural Equation Model in Stata
> 12. I want to create a health index according to Bound (1999): "The
> dynamic effects of health on the labor force transitions of older
> workers".
> I have the following variables available: Self-assessed health on a
> five-point scale (SAH), Age, Education, different objective measures
> of health such as blood pressure (Blood), chronic diseases (chronic)
> and physical limitations (limit).
> The theoretical model is the following:
> H = X*ß1 + Z*ß2 + u
> with  H=true health; X= socioeconomic variables; Z=objective health
> measures; u=error term

This is a regression, you don't need SEM for this.

> I do not observe the true health, but only the self-assessed health
> which includes a reporting error e, thus:
> SAH = H + e
> SAH = X*ß1 + Z*ß2 + v (with v=u+e)

This is still a regression with a single response variable. You don't
need SEM for this.

> I estimate the last equation via SEM the following way:
> sem (age -> sah) (education -> sah) (blood -> sah) (chronic -> sah)
> (limit -> sah)

This is still a regression... am I repeating myself???

> Then, I'm stuck - how do I get back to the first equation and model
> the health indicator H? Also, can I estimate an ordered Probit model
> in SEM?

OK, this is an ordered probit regression, then. Note that I am not
repeating myself here! Run -oprobit- and -predict, xb- if you really
want to get some sort of continuous scores for the health variable.
However, just using this information like that will not lead you
terribly far; you will probably have a somewhat finer gradation of
your health status variable, but the amount of measurement error in it
is not quantifiable. There is no way to break down the total error v =
u + e into individual components.

What you may want to consider instead is a MIMIC (multiple indicators
- multiple causes) model, in which the true health is determined by
demographics (and health behaviors like exercise level and smoking and
what not which would have been nice to have), and has objective
measures as indicators. Ignoring the ordinal nature of SAH, your model
will then be

sem (age educ gender smoke exercise -> Health) (Health -> SAH blood
chronic limit)

I suspect that -chronic- and -limit- are also categorical though. A
more appropriate tool to account for the categorical nature of the
data is -gllamm-. I think I mentioned this in my talk on (pre-sem)
ways of analyzing structural equation models -- see

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index