3rd UK User Group meeting: Panel Data estimators

James W. Hardin
London Users Group meeting
6 June 1997


  1. Panel data
  2. Estimators
  3. Identifiability issues
  4. Monte Carlo Simulation
  5. Summary


J.M. Neuhaus, J.D. Kalbfleisch, and W.W. Hauck
A Comparison of Cluster-Specific and Population Averaged Approaches for Analyzing Correlated Binary Data
International Statistical Review 59, 25-35.

Comparison of PA and SS models. Authors present two approaches to comparing these models. Good in combination with the Zeger/Liang/Albert paper.


J.F. Pendergast, S.J. Gange, M.A. Newton, M.J. Lindstrom, M. Palta, and M.R. Fisher
A Survey of Methods for Analyzing Clustered Binary Response Data
International Statistical Review 64, 89-118.

Survey paper with canonical list of proposed methods. Includes nice exposition on comparing the methods and a very good long reference list.


J.M. Neuhaus
Statistical methods for longitudinal and clustered designs
with binary responses
Statistical Methods in Medical Research 1, 249-273.

Survey paper which covers not only the PA and SS models, but also covers the transitional models, response conditional models, and some hybrid models. This paper also presents a data analysis example from a longitudinal study of AIDS behaviors among men in San Francisco which I will use in order to present the types of hypotheses addressed by the various panel estimators.


G. Chamberlain
Analysis of Covariance with Qualitative Data
Review of Economic Studies 225-238.

Comparison of fixed (including conditional) effects and random effects (focusing on PA models).


S.L. Zeger, K.-Y. Liang, and P.S. Albert
Models for Longitudinal Data: A Generalized Estimating Equation Approach
Biometrics 44, 1049-1060.

Comparison of SS and PA models for longitudinal data. An alternative comparison here from the presentation in the
Neuhaus paper.


K.-Y. Liang and S.L. Zeger
Longitudinal data analysis using generalized linear models
Biometrika 73, 13-22.

This paper was the introduction of the GEE PA model that is in Stata (xtgee).

Panel Data

In a panel dataset, we have observations for our dependent variable tex2html_wrap_inline504 such that the observations with common value for i are believed to be correlated. The i subscript is sometimes referred to as the individual, panel, subject, cluster, or group. The t subscript denotes the observation for the particular panel. There are tex2html_wrap_inline512 observations in the general unbalanced case. The t subscript is called the replication, time, or repeated measure.

Various authors refer to longitudinal data, cross sectional
data, panel data, and cross-sectional time-series.


There are two sources of variability from which we might build an estimator. There is the variability within (fixed effects) a cluster and there is the variability between the clusters.

Fixed Effects Estimators

To model fixed effects, one transforms the estimating equation in order to get rid of the fixed effects.

Random Effects Estimators

There are two obvious ways to approach building a random effects estimator. One may first assume that:


where tex2html_wrap_inline518 is a random value from some distribution F. Alternatively, one may assume that


and impose some restrictions on the covariance of tex2html_wrap_inline524

Random Effects Estimators

In more general terms we can write the model in terms of link and variance functions as


where tex2html_wrap_inline526 or we may assume that with
tex2html_wrap_inline528 we have


Random Effects Estimators

When are the two approaches the same?

They are the same if all of the tex2html_wrap_inline530 or when the link function h is the identity. This is because while tex2html_wrap_inline534 , it is not in general true that the same link function will have the property tex2html_wrap_inline536.

Note that the two approaches are the same for linear regression which uses the identity link. They are not the same for logistic or probit models that we examine later.

Random Effects Estimators (logit)

The two approaches for logit are


or alternatively, we may look at


along with appropriate assumptions on the covariance of the tex2html_wrap_inline542 terms (nuisance parameters) and where we assume that tex2html_wrap_inline544.

Multilevel models

There are also hybrid models that will estimate the probability that Y=1 averaged over the observations with the same covariate patterns. One method for doing this is Goldstein's multilevel models. These models at their simplest level are random effects models, but allow the researcher more flexibility in modeling the outcome.

Other Models

There are also other types of models one can use for analyzing the panel data. The first is called the transitional model and models the probability distribution of the outcome at time t, tex2html_wrap_inline504 as a function of the covariates at time t, tex2html_wrap_inline554 , and the individual’s outcome history tex2html_wrap_inline556.

Another model is called the response conditional model which accomodates correlation by modelling the response probability for each individual in the panel as a function of covariates for that individual and the responses for all individuals in that cluster.

Problems with SS Models

  1. Problems fitting cluster level variables
  2. Must have more than one observation per panel

Problems with PA Models

  1. Ignores information leading to coefficient attenuation

Comparison of SS and PA coefficients

Imagine a study where the dependent variable is whether a student performs acceptably on a standardized test. There are several students under the direction of each teacher in the study. One of the covariates is whether the student’s instructor assigns to the individual student Stata in the classroom for teaching purposes.

Usually, one would consider that the instructor would either use Stata or not use it in teaching all of the teacher's students. However, imagine that an instructor is free to assign Stata to some of the students in the classroom but not to all of the students. So, the use of Stata is not a cluster level variable.

Interpretation of the coefficient for the SS model

The SS model now allows direct observation and estimation of the average log odds ratio effect of the change in using Stata to teach upon exam performance. Mathematically, we collapse across students after we take the difference in log odds at time points where the instructor did and did not use Stata in the classroom. The coefficient then represents the common log odds ratio for passing the exam of the Stata effect across students.

Interpretation of the coefficient for the PA model

The PA coefficient, mathematically, first averages to find the mean risk and then computes the log odds. The PA model ignores the fact that the effect of the change in using Stata for an instructor had been measured, and persist in estimating only the odds ratio between Stata and non-Stata instructors. Instructors who changed would appear in both groups.

Now imagine, that there really are not any instructors that assign Stata to a subset of the class so that Stata use is really a cluster level variable.

One cannot directly observe a change in utilizing Stata. The PA model measures the log odds ratio between the two groups of instructors, whereas the SS model is supposed to report the effect of the change in the instructor's usage of Stata. However, no such change was measured, so the interpretation is entirely model-based as it is a type of extrapolation with no data to check the validity of the extrapolation. Note that the conditional likelihood approach for this same model won't allow estimation of the Stata effect.

Problems with Conditional models

  1. Can not fit cluster level variables
  2. Need to check for cluster level collinearity in each cluster

Note that for the logit estimator, the unconditional
fixed-effects estimator is inconsistent, but the conditional
estimator is consistent. Let tex2html_wrap_inline558 denote the conditional log-likelihood below.


So, the conditional likelihood is conditioned on the number of ones in the set (panel). Consider an example where there are a large number of panels each with two time period observations. The unconditional likelihood is given by


The observations are independent so that the likelihood
function is the product of the probabilities (we show above the log-likelihood). Note that for each pair of observations, we have the possibilities


The ith term of tex2html_wrap_inline558 for either of these outcomes is just 1. The log of that is zero, so that either of these outcomes contribute nothing to the log-likelihood.

Now, suppose that tex2html_wrap_inline566 and tex2html_wrap_inline568 so that we have



which gives that


which is free of tex2html_wrap_inline570.

Monte Carlo Simulations

There are two simulations that we ran both generating SS random effects data.


tex2html_wrap_inline574 is an unobserved latent variable.
tex2html_wrap_inline576 is the random effect.
tex2html_wrap_inline578 is the error term.
tex2html_wrap_inline580 if tex2html_wrap_inline582 where c is some cutoff value.


  1. Probit
  2. Probit with robust standard errors
  3. Maximum likelihood SS random effects probit
  4. GEE PA probit (exchangeable correlation)
  5. GEE PA probit (exchangeable correlation) with robust standard errors

Other Estimators


Simulation 1

tex2html_wrap_inline598 constant within panel (cluster level variable).
tex2html_wrap_inline600 constant across panels (within time)
tex2html_wrap_inline602 random within and across panels.
r = 1000 is the number of simulations for a given model.

Simulation 2

tex2html_wrap_inline600 constant across panels (within time)
tex2html_wrap_inline602 random within and across panels.
r = 500 is the number of simulations for a given model.

The main differences for the second simulation were the removal of the cluster level variable and the focus on smaller datasets.

Random Effects Likelihood


Problems with SS Random Effects Probit

  1. Problems fitting cluster level variables
  2. Numeric problems with quadrature

Simulation Results


  1. Coverage probability below nominal level.
  2. Derived test statistics not normally distributed even at very large sample sizes. As cluster size grows or as tex2html_wrap_inline620 becomes larger, the estimated standard errors are too small.

Simulation Results


The probit estimator differed little from the SS-RE model in terms of RMSE:


However, misleading results will result if one uses the reported standard errors in hypothesis tests.

Simulation Results

Probit with robust standard errors

  1. Coverage probability near nominal level.
  2. Derived test statistics normally distributed when overall sample sizes are larger than 1000 and tex2html_wrap_inline620 is small.

Simulation Results

SS Random Effects Probit

  1. Very good for small samples with low correlation
    tex2html_wrap_inline626 .
  2. Numeric problems with the quadrature calculations
    when either tex2html_wrap_inline628 or tex2html_wrap_inline512 becomes large (more than 10).

Simulation Results

SS Random Effects Probit

The major computational problem with the SS Random Effects Probit model is the need to evaluate the integral using quadrature. It is for these numeric reasons that this estimator did not perform better. However, it dominated the other estimators for small values of tex2html_wrap_inline512 and tex2html_wrap_inline626. One gains substantial improvement by increasing the number of Hermite points to about 8 to 10, but not much improvement after that. Guilkey and Murphy found it necessary to increase this to 16 for tex2html_wrap_inline636 and tex2html_wrap_inline638 to obtain good performance.

Simulation Results

SS Random Effects Probit

For Simulation 1, where for the cluster level variable, the SS RE Probit estimator had lower than nominal coverage and a much larger standard error than the PA Estimator. When tex2html_wrap_inline512 was small (4), the coverage was close to nominal though the RMSE was larger than for the population averaged approach.

Estimated standard errors are too small when tex2html_wrap_inline512 or tex2html_wrap_inline620 get large due to numerical problems of estimating the integral (not because the model is faulty).

Simulation Results

PA Random Effects Probit

  1. Coverage probability near nominal level only for
    tex2html_wrap_inline626 .
  2. Derived test statistics normally distributed when overall sample sizes are larger than 1000 and tex2html_wrap_inline620 is small.
  3. Coefficient estimates smaller than SS model.

Simulation Results

PA Random Effects Probit

Coefficients were smaller than for the SS model as theory dictates. The standard errors were too small, but coverage is close to nominal level for small cluster size even when tex2html_wrap_inline650 , but not close to nominal coverage when tex2html_wrap_inline638 .

Simulation Results

PA Random Effects Probit with robust standard errors

  1. Coverage probability near nominal when n>30.
  2. Derived test statistics normally distributed when overall sample sizes are larger than 1000 and tex2html_wrap_inline620 is small.

Simulation Results

PA Random Effects Probit with robust standard errors

Coefficients were smaller than for the SS model as theory dictates. The standard errors are of correct size and the coverage is close to nominal size for all sample sizes and values of tex2html_wrap_inline620.


Difference in PA and SS models


with appropriate assumptions concerning the covariance of tex2html_wrap_inline542.

tex2html_wrap_inline664 measures the change in proportion with Y=1 for a unit increase in X. Does not take advantage of repeated measurements on each study subject and the fact that the effects of the covariate changes within subjects on the response are directly observable. This model is most appropriate for cluster level variables.


tex2html_wrap_inline672 measures the change in probability of response with covariate X for individuals in each of the underlying risk groups described by tex2html_wrap_inline518 . Not appropriate for cluster level variables since this effect is not directly observable.

Problems with Conditional models

  1. Can not fit cluster level variables
  2. Need to check for cluster level collinearity in
    each cluster