Stata 15 help for fmm

[FMM] fmm -- Finite mixture models using the fmm prefix


Standard syntax

fmm # [if] [in] [weight] [, fmmopts] : component

Hybrid syntax

fmm [if] [in] [weight] [, fmmopts] : (component_1) (component_2) ...

where the standard syntax for component is

model depvar indepvars [, options]

the hybrid syntax for component is

model depvar indepvars [, lcprob(varlist) options]

model is an estimation command, and options are model-specific estimation options.

fmmopts Description ------------------------------------------------------------------------- Model lcinvariant(pclassname) specify parameters that are equal across classes; default is lcinvariant(none) lcprob(varlist) specify independent variables for class probabilities lclabel(name) name of the categorical latent variable; default is lclabel(Class) lcbase(#) base latent class constraints(constraints) apply specified linear constraints collinear keep collinear variables

SE/Robust vce(vcetype) vcetype may be oim, robust, or cluster clustvar

Reporting level(#) set confidence level; default is level(95) nocnsreport do not display constraints noheader do not display header above parameter table nodvheader do not display dependent variables information in the header notable do not display parameter table display_options control columns and column formats, row spacing, line width, display of omitted variables and base and empty cells, and factor-variable labeling

Maximization maximize_options control the maximization process startvalues(svmethod) method for obtaining starting values; default is startvalues(factor) emopts(maxopts) control EM algorithm for improved starting values noestimate do not fit the model; show starting values instead

coeflegend display legend instead of statistics ------------------------------------------------------------------------- varlist may contain factor variables; see fvvarlist. by, statsby, and svy are allowed; see prefix. vce() and weights are not allowed with the svy prefix. fweights, iweights, and pweights are allowed; see weight. coeflegend does not appear in the dialog box. See [FMM] fmm postestimation for features available after estimation.

pclassname Description ------------------------------------------------------------------------- cons intercepts and cutpoints coef fixed coefficients errvar covariances of errors scale scaling parameters ------------------------------------------------------------------------- all all the above none none of the above; the default -------------------------------------------------------------------------


Statistics > FMM (finite mixture models) > General estimation and regression


The fmm prefix fits finite mixture models; see [FMM] fmm estimation for the list of supported commands.


+-------+ ----+ Model +------------------------------------------------------------

lcinvariant(pclassname) specifies which parameters of the model are constrained to be equal across the latent classes; the default is lcinvariant(none).

lcprob(varlist) specifies that the linear prediction for a given latent class probability include the variables in varlist. lcinvariant() has no effect on these parameters.

In the standard syntax, varlist is used in the linear prediction for each latent class probability.

In the hybrid syntax, specify lcprob(varlist_i) in component_i to include varlist_i in the linear prediction for the ith latent class probability. lcprob() is not allowed to be specified in fmmopts if it is being used in one or more component specifications.

In the hybrid syntax, if you specify lcprob() in the component that corresponds with the base latent class, the option is ignored.

lclabel(name) specifies a name for the categorical latent variable; the default is lclabel(Class).

lcbase(#) specifies that # is to be treated as the base latent class.

In the standard syntax, the default is lcbase(1).

In the hybrid syntax, the default base is the latent class corresponding to the first component that does not have lcprob() specified. If all components have lcprob(), the first component is the base and the lcprob() option specified for the first component is ignored.

constraints(), collinear; see [R] estimation options.

+-----------+ ----+ SE/Robust +--------------------------------------------------------

vce(vcetype) specifies the type of standard error reported, which includes types that are derived from asymptotic theory (oim), that are robust to some kinds of misspecification (robust), and that allow for intragroup correlation (cluster clustvar); see [R] vce_option.

+-----------+ ----+ Reporting +--------------------------------------------------------

level(#); see [R] estimation options.

nocnsreport suppresses the display of the constraints. Fixed-to-zero constraints that are automatically set by fmm are not shown in the report to keep the output manageable.

noheader suppresses the header above the parameter table, the display that reports the final log-likelihood value, number of observations, etc.

nodvheader suppresses the dependent variables information from the header above each parameter table.

notable suppresses the parameter tables.

display_options: noci, nopvalues, noomitted, vsquish, noemptycells, baselevels, allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), pformat(%fmt), sformat(%fmt), and nolstretch; see [R] estimation options.

+--------------+ ----+ Maximization +-----------------------------------------------------

maximize_options: difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, tolerance(#), ltolerance(#), nrtolerance(#) and nonrtolerance, and from(init_specs); see [R] maximize. These options are seldom used.

startvalues() specifies how starting values are to be computed. Starting values specified in from() override the computed starting values.

startvalues(factor [, maxopts]) specifies that starting values are computed by assigning each observation to an initial latent class that is determined by running a factor analysis on all the observed variables in the specified model. This is the default.

startvalues(classid varname[, maxopts]) specifies that starting values are computed by assigning each observation to an initial latent class specified in varname. varname is required to have each class represented in the estimation sample.

startvalues(classpr varlist[, maxopts]) specifies that starting values are computed using the initial class probabilities specified in varlist. varlist is required to contain g variables for a model with g latent classes. The values in varlist are normalized to sum to 1 within each observation.

startvalues(randomid [, draws(#) seed(#) maxopts]) specifies that starting values are computed by randomly assigning observations to initial classes.

startvalues(randompr [, draws(#) seed(#) maxopts]) specifies that starting values are computed by randomly assigning initial class probabilities.

startvalues(jitter [#_c [#_v], draws(#) seed(#) maxopts]) specifies that starting values are constructed by randomly perturbing the values from a Gaussian approximation to each outcome.

#_c is the magnitude for randomly perturbing coefficients, intercepts, cutpoints, and scale parameters; the default value is 1.

#_v is the magnitude for randomly perturbing variances for Gaussian outcomes; the default value is 1.

startvalues(zero) specifies that starting values are to be set to 0. This option is only useful if you use from() to specify starting values for some parameters and want the remaining starting values to be 0.

Most starting values options have suboptions that allow for tuning the starting values calculations:

maxopts is a subset of the standard maximize_options: difficult, technique(algorithm_spec), iterate(#), [no]log, trace, gradient, showstep, hessian, showtolerance, tolerance(#), ltolerance(#), nrtolerance(#); see [R] maximize.

draws(#) specifies the number of random draws. For startvalues(randomid), startvalues(randompr), and startvalues(jitter), fmm will generate # random draws and select the starting values from the draw with the best log-likelihood value from the EM iterations. The default is draws(1).

seed(#) sets the random-number seed.

emopts(maxopts) controls maximization of the log likelihood for the EM algorithm. maxopts is the same subset of maximize_options that are allowed in the startvalues() option, but some of the defaults are different for the EM algorithm. The default maximum number of iterations is iterate(20). The default coefficient vector tolerance is tolerance(1e-4). The default log-likelihood tolerance is ltolerance(1e-6).

noestimate specifies that the model is not to be fit. Instead, starting values are to be shown (as modified by the above options if modifications were made), and they are to be shown using the coeflegend style of output. An important use of this option is before you have modified starting values at all; you can type the following:

. fmm ..., ... noestimate : ... . matrix b = e(b) . ... (modify elements of b) ... . fmm ..., ... from(b) : ...

The following option is available with fmm but is not shown in the dialog box:

coeflegend displays the legend that reveals how to specify estimated coefficients in _b[] notation, which you are sometimes required to type when specifying postestimation commands.


For a general introduction to finite mixture models, see [FMM] fmm intro. For the list of estimation commands supported by the fmm prefix, see [FMM] fmm estimation.


--------------------------------------------------------------------------- Setup . webuse stamp

Mixture of three normal distributions of thickness . fmm 3: regress thickness

Estimated probabilities of membership in the three classes . estat lcprob

--------------------------------------------------------------------------- Setup . webuse mus03sub

Mixture of three linear regression models . fmm 3: regress lmedexp income c.age##c.age totchr

Include totchr as a predictor of class membership . fmm 3, lcprob(totchr): regress lmedexp income c.age##c.age totchr

--------------------------------------------------------------------------- Setup . webuse gsem_mixture

Mixture of two Poisson regression models . fmm 2: poisson drvisits private medicaid c.age##c.age actlim chronic

Marginal predicted counts for each class . estat lcmean

--------------------------------------------------------------------------- Setup . webuse fish2

Zero-inflated Poisson model as a mixture of a point mass distribution at zero and a Poisson regression model . fmm: (pointmass count) (poisson count persons boat)


Stored results

fmm stores the following in e():

Scalars e(N) number of observations e(k) number of parameters e(k_eq) number of equations in e(b) e(k_dv) number of dependent variables e(k_cat#) number of categories for the #th depvar, ordinal e(k_out#) number of categories for the #th depvar, mlogit e(ll) log likelihood e(N_clust) number of clusters e(rank) rank of e(V) e(ic) number of iterations e(rc) return code e(converged) 1 if target model converged, 0 otherwise

Macros e(cmd) gsem e(cmd2) fmm e(cmdline) command as typed e(prefix) fmm e(depvar) names of dependent variables e(eqnames) names of equations e(wtype) weight type e(wexp) weight expression e(title) title in estimation output e(clustvar) name of cluster variable e(model#) model for the #th component e(offset#) offset for the #th depvar e(vce) vcetype specified in vce() e(vcetype) title used to label Std. Err. e(opt) type of optimization e(which) max or min; whether optimizer is to perform maximization or minimization e(method) estimation method: ml e(ml_method) type of ml method e(user) name of likelihood-evaluator program e(technique) maximization technique e(properties) b V e(estat_cmd) program used to implement estat e(predict) program used to implement predict e(covariates) list of covariates e(lclass) name of latent class variable e(marginsnotok) predictions not allowed by margins e(marginsdefault) default predict() specification for margins e(footnote) program used to implement the footnote display e(asbalanced) factor variables fvset as asbalanced e(asobserved) factor variables fvset as asobserved

Matrices e(b) parameter vector e(b_pclass) parameter class e(cat#) categories for the #th depvar, ordinal e(out#) outcomes for the #th depvar, mlogit e(Cns) constraints matrix e(ilog) iteration log (up to 20 iterations) e(gradient) gradient vector e(V) covariance matrix of the estimators e(V_modelbased) model-based variance e(lclass_k_levels) number of levels for latent class variables e(lclass_bases) base levels for latent class variables e(_N) sample size for each component

Functions e(sample) marks estimation sample

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index