Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: From: Emma Gorman <[email protected]>

From   [email protected]
To   [email protected]
Subject   st: From: Emma Gorman <[email protected]>
Date   Wed, 24 Aug 2011 14:02:42 +1200

Hi all,

I am estimating a random effects logit model, augmented with cluster
means  to account for correlation between effects and varying
covariates (a la Mundlak 1978), using xtlogit.

xtlogit WKRT $tmeans if allyears==1, or intpoints(20) re ;

Where tmeans is regular covariates + cluster means.

I have three waves of data (individuals over time) and would ideally like to
use all longitudinal respondents (those who are in all three waves in this
analysis). However, a fair few individuals have missing data for some
covariates, so these observations are dropped from the regression model.

I initially ended up with  *minimum observation per group: 1*
                                                     avg obs per group: 2.5
                                                 max obs per group: 3

I found that there were many individuals who were observed in all waves, but
only had non-missing information for all covariates (and dep variable) in
one wave. I had assumed such people, who have only one usable wave of
observation, would be automatically dropped form estimation by Stata as they
provide no longitudinal info for the model.

So I isolated and removed these people from the estimation command manually,
to end up with:

Random effects u_i ~ Gaussian                   Obs per group: min =2
                                                              avg = 2.7
                                                              max = 3

My question is essentially: why is it that such cases, which only provide
cross-sectional information, not dropped automatically (/should / they be
dropped) ? Or is there an option to only use longitudinal information? How
is this consistent with theory? It seems strange that the default should be
to include everyone.

My understanding of random effects models is that they use the most
efficient combination of between and within variation, the time invariant
individual effects are integrated out of the likelihood function and are
assumed to be independent (in the non-linear case).

So we don't want to know about those who don't have longitudinal information
for estimation. (??)

NB a complication with inclusion of cluster means is that  if there
are individuals who only have one usable wave of information due to a
missing dependent variable for the other waves, these guys still have
valid cluster means for the explanatory variables, so in some sense
there is still within and between information even with just one
'wave' of usable information. So perhaps these guys should not be
gotten rid of...

Any thoughts / advice / clarification much appreciated.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index