- Use
**gsem**'s**lclass()**option to fit- Latent class models
- Latent profile models
- Path models with categorical latent variables
- Multiple-group models with known groups

- Categorical latent variables measured by
- Binary items
- Ordinal items
- Continuous items
- Count items
- Categorical items
- Fractional items
- Survival items

- Model-based method of classification
- Goodness of fit: G\(^2\), AIC, BIC
- Estimate probabilities, means, counts for items in each class
- Estimate proportion of population in each class
- Predict class membership
- Multiple options for obtaining starting values
- Robust and cluster–robust standard errors
- Support for complex survey data

We believe that there are groups in our population and that individuals in these groups behave differently. But we don't have a variable that identifies the groups. The groups may be consumers with different buying preferences, adolescents with different patterns of behavior, or health status classifications. LCA lets us identify and understand these unobserved groups. It lets us know who is likely to be in a group and how that group's characteristics differ from other groups.

In latent class models, we use a latent variable that is categorical to represent the groups, and we refer to the groups as classes.

Latent class models contain two parts. One fits the probabilities of who belongs to which class. The other describes the relationship between the classes and the observed variables.

The LCA models that Stata can fit include the classic models:

- probability of class membership
- binary items

And extensions:

- covariates determining the probability of class membership
- items that are binary, ordinal, continous, or even any of the
other types that Stata's
**gsem**can fit - SEM path models that vary across latent classes

Let's work with a classic model using an example of teen behavior (but on fictional data).

We have a set of observed variables that indicate whether
adolescents have consumed alcohol (**alcohol**), have more than 10
unexcused absences from school (**truant**), have used a weapon in a
fight (**weapon**), have engaged in vandalism (**vandalism**), and have
stolen objects worth more than $25 (**theft**). We will use these
items to fit a latent class model with three unobserved behavior
classes. We type

.gsem (alcohol truant weapon theft vandalism <-), logit lclass(C 3)

We will not show the output of this command. If we had included predictors of the class probabilities or fit a latent profile model with continuous outcomes or fit a path model, the results would be more interesting. In this classic model, however, the reported coefficients are not very informative.

Instead, we will use the **estat lcprob** and **estat lcmean**
commands to estimate statistics that we can interpret easily.

**estat lcprob** reports the probabilities of class membership.

.estat lcprobLatent class marginal probabilities Number of obs = 10,000

Delta-method | ||

Margin Std. Err. [95% Conf. Interval] | ||

C | ||

1 | .1631459 .0390465 .1001516 .2545543 | |

2 | .7979467 .0389126 .7110459 .8637217 | |

3 | .0389074 .016552 .0167174 .087918 | |

These are the expected proportions of the population in each class.

**estat lcmean** reports the estimated mean for each item in each class.

.estat lcmeanLatent class marginal means Number of obs = 10,000

Delta-method | ||

Margin Std. Err. [95% Conf. Interval] | ||

1 | ||

alcohol | .7453054 .055844 .6217856 .8389348 | |

truant | .3461541 .0511504 .2537076 .4518892 | |

weapon | .0928717 .0273732 .0513735 .162161 | |

theft | .0207514 .0341546 .0007855 .3635664 | |

vandalism | .2407638 .0519997 .1536777 .3564169 | |

2 | ||

alcohol | .3120356 .0150695 .2832886 .3423065 | |

truant | .0626883 .0076641 .0492432 .0794975 | |

weapon | .0089407 .0023358 .0053525 .0148983 | |

theft | .0123995 .002113 .0088731 .0173028 | |

vandalism | .0471581 .005303 .0377877 .0587103 | |

3 | ||

alcohol | .7227077 .0346378 .6500293 .7852786 | |

truant | .4910226 .0426645 .4084191 .5741192 | |

weapon | .2985073 .0498659 .2106263 .4042766 | |

theft | .6199426 .18702 .2560826 .8854454 | |

vandalism | .5883386 .0735655 .4407238 .7216031 | |

The results are the probabilities of **alcohol**, **truant**, etc.,
for each class. Our items are binary events. Had **alcohol** been the
amount of alcohol consumed per day, **estat lcmean** would have
reported average alcohol consumption for each class.

Let's summarize the results from **estat lcprob** and
**estat lcmean**.

Class 1 Class 2 Class 3 |

Pr(Class) 0.16 0.80 0.04 |

Probability of |

alcohol 0.75 0.31 0.72 |

truant 0.35 0.06 0.49 |

weapon 0.09 0.01 0.30 |

theft 0.02 0.01 0.62 |

vandalism 0.24 0.05 0.59 |

The table reveals that

1) 16%, 80%, and 4% percent of our students are predicted to be in
class 1, class 2, and class 3, respectively.

2) Class 2 is best behaved judging by the probabilities of **alcohol**, **truant**, ..., and **vandalism**.

3) Class 1 is the next best behaved.

4) Class 3 is the worst behaved.

We can use **margins** and **marginsplot** to visually compare
the probabilities of participating in these activities across classes.

We fit our classic LCA model by typing

.gsem (alcohol truant weapon theft vandalism <-), logit lclass(C 3)

If we believe class membership depends on parents' income, we can include
it in the model for **C** by typing

.gsem (alcohol truant weapon theft vandalism <-, logit) (C <- income), lclass(C 3)

We moved **logit** inside the parentheses for the five behavior items.
This means it applies only to those equations. We don't need to say
that the model for **C** is multinomial logit; that is automatic.

We are not limited to **logit** models for our items. If the behavior
items are instead continuous, we can type

.gsem (alcohol truant weapon theft vandalism <-, gaussian), lclass(C 3)

If they are ordinal, we can type

.gsem (alcohol truant weapon theft vandalism <-, ologit), lclass(C 3)

And if the behavior items are of differing types, we can even type

.gsem (alcohol <-, gaussian) (truant <-, poisson) (weapon <-, logit) (theft <-, ologit) (vandalism <-, logit), lclass(C 3)

Still, this just scratches the surface of what we can do with
**gsem**'s latent class features. For instance, **gsem** fits
path models such as

.gsem (y1 <- y2 x1 x2) (y2 <- y3 x1 x3) (y3 <- x2 x3 x4)

and we can allow them to vary across classes,

.gsem (y1 <- y2 x1 x2) (y2 <- y3 x1 x3) (y3 <- x2 x3 x4), lclass(C 2)

Learn more about Stata's latent class analysis features.

Read more about latent class models in the *Stata Structural Equation Modeling Reference Manual*. For more examples,
see

Latent class model

Latent class goodness-of-fit statistics

Latent profile model