Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: analysing experimental panel data

From   Joerg Luedicke <>
Subject   Re: st: analysing experimental panel data
Date   Thu, 18 Oct 2012 11:12:34 -0500

This is a quite general inquiry and there is probably a lot of wriggle
room in terms of how to analyze these data. There may also be a number
of details that could matter that don't show up in the post. However,
here is one possible approach.

First of all, I would not call this 'panel data' in a sense that time
as such does not seem to play a role here. I would rather just call it
hierarchical data. Another thing is that this study probably does not
qualify as an experiment since there are no randomized
treatment/control groups (at least that is what I gather from the
post, so please correct me if I am wrong). So my first intuition here
would be to fit a multilevel model (aka mixed effects model) with a
bunch of interaction terms. I only consider varying intercepts here
but this could of course be extended to varying slopes as well. I also
only consider 3 groups here, for sake of simplicity.

Let's start with generating some data:

//2k individuals
set seed 1234
set obs 2000
gen id=_n
gen ei=rnormal() //unit-specific error term

//3 groups
gen p1=runiform()
gen group=cond(p1<.60, 1, cond(p < .80, 2, 3 ))
label def gr 1"No cannabis" 2"sometimes" 3"regularly"
label val group gr
qui tab group, g(group_)

//Expanding to 3 observations each
expand 3
bys id: gen treat=_n
label def trt 1"base" 2"min price" 3"tax"
label val treat trt
qui tab treat, g(trt_)

//Generating outcome (count of drinks at a Saturday night)
//assuming only non-cannabis users care about prices
gen xb = 0.3 + 0.2*group_2 + 0.4*group_3 - 0.2*trt_2 - 0.2*trt_3 ///
+ 0.2*group_2*trt_2 + 0.2*group_3*trt_3 + 0.2*group_2*trt_3 +
0.2*group_3*trt_2 ///
+ ei
gen exp=exp(xb)
gen y=rpoisson(exp)

In the above data generation we assume that people who consume
cannabis drink more than people who don't, and people who use it
regularly drink even more than people who just use it sometimes. We
further assume that people who do not use cannabis drink less when
prices increase, but cannabis users do not care about prices.

We can then fit the model using a multilevel Poisson model:

//Fitting a multilevel Poisson model
xtmepoisson y || id:

And can obtain marginal counts for all treatment by cannabis groups:

//Predicted counts using model fixed effects
margins, predict(fixedonly)

after which we can compare differences in drinking amounts using
-test- (possibly with the -mtest- option if we do multiple
comparisons). However, these are not really marginal counts in the
sense that they are not population averaged counts because we
disregard the random error which stems from the variation of
differences in baseline drinking among the 2k individuals. Getting
'real' population averaged effects here is not easy because we cant
just average over the random effects since the error is only normally
distributed with a mean of zero at the predictor scale, not the
outcome scale. However, an easy alternative would be to just fit a
marginal model:

//Population averaged model
xtgee y, family(poisson) link(log) i(id) vce(robust)

And again we can look at the marginal counts:

//Marginal counts

and can do some testing, for example:

//Testing the difference in #drinks between baseline and min-price increase
//for people who use cannabis sometimes vs. non-users
test 	(_b[]-_b[]) =  ///

Depending on what you actually want to test it might be unnecessary to
go via -margins-. For example the above test is equivalent to the test
for the group_2#treat_2 interaction term in the model. However, it is
always a good idea to look at some model predictions to check whether
they actually make sense etc.


On Wed, Oct 17, 2012 at 9:08 PM, Matthew Sunderland
<> wrote:
> Hi All
> I am seeking advice on how best to analyse data arising from an experiment. We surveyed 2,000 people asking them to hypothetically purchase and consume alcohol for an imaginary Saturday night.
> We collected data for three imaginary nights - First we presented participants with a set of alcohol prices reflecting current prices (baseline). We presented participants with two mores set of prices in a randomized order reflecting  price increase resulting from i) the establishment of a minimum price and ii) an increase in the rate of tax. Participants comprise six quotas, differentiated by gender and recent cannabis and ecstasy use.  Alcohol consumption is measured by the number of standard drinks, calculated by us from participant reports of how many items of alcohol they would consume eg glasses of wine, stubbies of beer etc. About 30% of the participants did not drink at baseline.
> We'd like to know: Do the two reforms have different impacts? Do people in different quotas respond differently to the reforms?  Do people with different levels of base-line drinking  respond differently to the reforms?
> One option we've thought of is for us to run two sets of fixed effects analysis (washes unobserved heterogeneity relating to alcohol consumption and quota membership)- using panel data for drinking at baseline and one of the reforms. Another option is for us to simply control for baseline consumption. We're thinking of running the analysis in two steps - a logit for whether or not someone drinks and an OLS regression for drinkers  -  log of standard drinks consumed, controlling for the predicted values coming from the logit.
> Thanks,
> Dr Matthew Sunderland
> Drug Policy Modelling Program,  National Drug and Alcohol Research Centre
> The University of New South Wales
> Sydney NSW AUSTRALIA 2052
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index