Dear statalisters
We have been fitting latent class models, the output of which is a
set of
posterior probabilities that each subject falls into one of six
latent
classes. We now want to use multinomial logistic regression (mlogit)
to
examine predictors of class membership.
One option is to assign each subject to her/his modal class (the
class for
which there is the highest probability of membership. However loses
information (some subjects will have a high probability that they
belong to
a particular class, others will have relatively similar probabilities
of
membership of two or more classes.
As an alternative, we wish to fit multinomial logistic regression
models
using the class variable as the multinomial outcome and weighting the
analysis using class membership probabilities.
We have stacked the data so we have multiple rows for each subject in
the
following form
ID Exposure Class Prob
1 1 1 0.1
1 1 2 0.1
1 1 3 0.4
1 1 4 0.3
1 1 5 0.05
1 1 6 0.05
'Prob' sums to one within subject and class repeats 1,2,3,4,5,6
through the
whole dataset.
We weight using pweights [pw = prob]
Consequently, our model of choice has been:
xi: mlogit class xvars [pw = prob], rrr
(identical to xi: mlogit class xvars [iw = prob], rrr robust)
and we have also experimented with
xi: mlogit class xvars [pw = prob], rrr robust cluster(id)
which gives lower SE's, and
xi: mlogit class exposure [iweight = prob], rrr
which gives *higher* SE's than the pweight model without 'robust'
We would be grateful for advice on the following questions:
1. Is it appropriate to weight according to class membership
probability
(we are pretty convinced that it is)?
2. Does anyone have a recommendation as to which of the above model
formulations gives theoretically appropriate standard errors?
Many thanks
Jonathan Sterne