Latent class analysis (LCA)

Order

Watch video demo

<- See Stata's other features

Highlights

Use gsem's lclass() option to fit

Latent class models
Latent profile models
Path models with categorical latent variables
Multiple-group models with known groups

Categorical latent variables measured by

Binary items
Ordinal items
Continuous items
Count items
Categorical items
Fractional items
Survival items

Model-based method of classification
Goodness of fit: G$^2$, AIC, BIC
Estimate probabilities, means, counts for items in each class
Estimate proportion of population in each class
Predict class membership
Multiple options for obtaining starting values
Robust and cluster–robust standard errors
Support for complex survey data

We believe that there are groups in our population and that individuals in these groups behave differently. But we don't have a variable that identifies the groups. The groups may be consumers with different buying preferences, adolescents with different patterns of behavior, or health status classifications. LCA lets us identify and understand these unobserved groups. It lets us know who is likely to be in a group and how that group's characteristics differ from other groups.

In latent class models, we use a latent variable that is categorical to represent the groups, and we refer to the groups as classes.

Latent class models contain two parts. One fits the probabilities of who belongs to which class. The other describes the relationship between the classes and the observed variables.

The LCA models that Stata can fit include the classic models:

probability of class membership
binary items

And extensions:

covariates determining the probability of class membership
items that are binary, ordinal, continous, or even any of the other types that Stata's gsem can fit
SEM path models that vary across latent classes

Let's see it work

Let's work with a classic model using an example of teen behavior (but on fictional data).

We have a set of observed variables that indicate whether adolescents have consumed alcohol (alcohol), have more than 10 unexcused absences from school (truant), have used a weapon in a fight (weapon), have engaged in vandalism (vandalism), and have stolen objects worth more than $25 (theft). We will use these items to fit a latent class model with three unobserved behavior classes. We type

. gsem (alcohol truant weapon theft vandalism <-), logit lclass(C 3)

We will not show the output of this command. If we had included predictors of the class probabilities or fit a latent profile model with continuous outcomes or fit a path model, the results would be more interesting. In this classic model, however, the reported coefficients are not very informative.

Instead, we will use the estat lcprob and estat lcmean commands to estimate statistics that we can interpret easily.

estat lcprob reports the probabilities of class membership.

. estat lcprob

Latent class marginal probabilities          Number of obs =     10,000



                Delta-method                        
         Margin   std. err.     [95% conf. interval]

           C                        
          1      .1631459   .0390465      .1001516    .2545543        
          2      .7979467   .0389126      .7110459    .8637217        
          3      .0389074    .016552      .0167174     .087918

These are the expected proportions of the population in each class.

estat lcmean reports the estimated mean for each item in each class.

. estat lcmean

Latent class marginal means                  Number of obs =     10,000



                Delta-method                        
         Margin   std. err.     [95% conf. interval]

1                                   
     alcohol     .7453054    .055844      .6217856    .8389348        
      truant     .3461541   .0511504      .2537076    .4518892        
      weapon     .0928717   .0273732      .0513735     .162161        
       theft     .0207514   .0341546      .0007855    .3635664        
   vandalism     .2407638   .0519997      .1536777    .3564169        

2                                   
     alcohol     .3120356   .0150695      .2832886    .3423065        
      truant     .0626883   .0076641      .0492432    .0794975        
      weapon     .0089407   .0023358      .0053525    .0148983        
       theft     .0123995    .002113      .0088731    .0173028        
   vandalism     .0471581    .005303      .0377877    .0587103        

3                                   
     alcohol     .7227077   .0346378      .6500293    .7852786        
      truant     .4910226   .0426645      .4084191    .5741192        
      weapon     .2985073   .0498659      .2106263    .4042766        
       theft     .6199426     .18702      .2560826    .8854454        
   vandalism     .5883386   .0735655      .4407238    .7216031

The results are the probabilities of alcohol, truant, etc., for each class. Our items are binary events. Had alcohol been the amount of alcohol consumed per day, estat lcmean would have reported average alcohol consumption for each class.

Let's summarize the results from estat lcprob and estat lcmean.



                Class 1    Class 2   Class 3

Pr(Class)          0.16       0.80      0.04
Probability of                          
  alcohol          0.75       0.31      0.72
  truant           0.35       0.06      0.49
  weapon           0.09       0.01      0.30
  theft            0.02       0.01      0.62
  vandalism        0.24       0.05      0.59

The table reveals that

16%, 80%, and 4% percent of our students are predicted to be in class 1, class 2, and class 3, respectively.
Class 2 is best behaved judging by the probabilities of alcohol, truant, ..., and vandalism.
Class 1 is the next best behaved.
Class 3 is the worst behaved.

We can use margins and marginsplot to visually compare the probabilities of participating in these activities across classes.

Extensions

We fit our classic LCA model by typing

. gsem (alcohol truant weapon theft vandalism <-), logit lclass(C 3)

If we believe class membership depends on parents' income, we can include it in the model for C by typing

. gsem (alcohol truant weapon theft vandalism <-, logit) (C <- income), lclass(C 3)

We moved logit inside the parentheses for the five behavior items. This means it applies only to those equations. We don't need to say that the model for C is multinomial logit; that is automatic.

We are not limited to logit models for our items. If the behavior items are instead continuous, we can type

. gsem (alcohol truant weapon theft vandalism <-, gaussian), lclass(C 3)

If they are ordinal, we can type

. gsem (alcohol truant weapon theft vandalism <-, ologit), lclass(C 3)

And if the behavior items are of differing types, we can even type

. gsem (alcohol   <-, gaussian)
       (truant    <-, poisson)
       (weapon    <-, logit)
       (theft     <-, ologit)
       (vandalism <-, logit),
       lclass(C 3)

Still, this just scratches the surface of what we can do with gsem's latent class features. For instance, gsem fits path models such as

. gsem (y1 <- y2 x1 x2)
       (y2 <- y3 x1 x3)
       (y3 <- x2 x3 x4)

and we can allow them to vary across classes,

. gsem (y1 <- y2 x1 x2)
       (y2 <- y3 x1 x3)
       (y3 <- x2 x3 x4),
       lclass(C 2)

Tell me more

Learn more about Stata's latent class analysis features.

Read more about latent class models in the Structural Equation Modeling Reference Manual. For more examples, see

Latent class model
Latent class goodness-of-fit statistics
Latent profile model

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Delta-method
		Margin std. err. [95% conf. interval]

C
1		.1631459 .0390465 .1001516 .2545543
2		.7979467 .0389126 .7110459 .8637217
3		.0389074 .016552 .0167174 .087918


		Delta-method
		Margin std. err. [95% conf. interval]

1
alcohol		.7453054 .055844 .6217856 .8389348
truant		.3461541 .0511504 .2537076 .4518892
weapon		.0928717 .0273732 .0513735 .162161
theft		.0207514 .0341546 .0007855 .3635664
vandalism		.2407638 .0519997 .1536777 .3564169

2
alcohol		.3120356 .0150695 .2832886 .3423065
truant		.0626883 .0076641 .0492432 .0794975
weapon		.0089407 .0023358 .0053525 .0148983
theft		.0123995 .002113 .0088731 .0173028
vandalism		.0471581 .005303 .0377877 .0587103

3
alcohol		.7227077 .0346378 .6500293 .7852786
truant		.4910226 .0426645 .4084191 .5741192
weapon		.2985073 .0498659 .2106263 .4042766
theft		.6199426 .18702 .2560826 .8854454
vandalism		.5883386 .0735655 .4407238 .7216031