# Re: st: Longitudinal data using Stata

 From "Clive Nicholas" To statalist@hsphsun2.harvard.edu Subject Re: st: Longitudinal data using Stata Date Sat, 6 Aug 2005 04:03:54 +0100 (BST)

```George Konstantinou wrote:

> I have 150 (daily) repeated measurements (scores) of 68 individuals where
> each score corresponds to the severity of the symptoms (0=no symptoms to
> 5=severe symptoms, continuous variable). For all these days I have
> recorded the pollutants' load in the atmosphere (continuous variables).

For the suggestions that follow, please bear in mind that these are _only_
suggestions and nothing else. Nobody knows your data better than you do.
:)

> I would like to figure out if there is an association between the severity
> of the symptoms with all the polutants. But there is a logical thought
> that perhaps the symptoms are getting more severe after some days of
e.g. > heavy loads of pollutants and not necessarily the same day. So
how can I > check if there is such a pattern?
>
> Which you think is the appropriate statistical approach to deal with these
> data and how can I apply this to Stata?

Keeping it simple and tractable, I would start from how the dependent
variable is structured: only six values running from 0 to 5. Given this,
one approach would be to try -ologit- (or -oprobit- if you prefer) and
then modelling the time variables as, say, fortnights (of which, there
would be ten - plus ten days - in a 150-day period). I would hesitate to
model more specific temporal variables than this, since N=68 and you might
be consuming too many degrees of freedom, especially if you have other
variables that you wanted to put into the model.

An alternative approach would be to 'pool' your N units across T
time-points. This would you give you (N * T) = NT observations. In your
case: 68 * 150 = 10200 observations. This easily sorts out any df
problems. At a stroke, your options multiply, and you can use any number
of pooled regression methods. Perhaps the two most common and one newcomer
are relevant here:

(a) fixed-effect OLS models (-xtreg, fe-), or better still -areg- if you
wished to control for 'clustering' effects. Such models are best where
the 'fixed-error' component u[i] (i.e., variables that do not vary over
time, such as race and gender) is known to be correlated with the
regressors Xb;

(b) random-effect GLS models (-xtreg- or -xtreg, re-), which allows you to
model any time-invariant variables, but can be used _only_ if
corr(u[i], Xb) = 0;

and

(c) a mixture model of (a) and (b) in form of -xtmixed-, which is new to
Stata 9.

But, unfortunately, there's a catch. None of these pooled models may
really be appropriate, since the dependent variable has a heavily bounded
range: these models are technically only 'legal' to use if the dependent
variable is unbounded, lest it predicts values outside the range. Also, if
you wanted to model the effect of each day specifically, you could with
these models quite easily given your large df: but I wouldn't like to be
the one sifting through regression output containing 150+ variables!

Thus, there's a trade-off: use -ologit- which is more appropriate for your
DV but run the risk of consuming lots of df if your model is large, or use
an -xt- model which sorts out any df problems but which may not be

I hope all the above helps, and good luck.

CLIVE NICHOLAS        |t: 0(044)7903 397793
Politics              |e: clive.nicholas@ncl.ac.uk
Newcastle University  |http://www.ncl.ac.uk/geps

Whereever you go and whatever you do, just remember this. No matter how
many like you, admire you, love you or adore you, the number of people
turning up to your funeral will be largely determined by local weather
conditions.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```