Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: analysing experimental panel data


From   Joerg Luedicke <joerg.luedicke@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: analysing experimental panel data
Date   Fri, 19 Oct 2012 07:58:57 -0500

You're welcome!

Joerg

On Thu, Oct 18, 2012 at 10:45 PM, Matthew Sunderland
<matthews@unsw.edu.au> wrote:
> Thank you for your reply!
>
> You were correct on your initial assumption. 2,000 participants were confronted with both prices and that order was randomised. The advice you gave sounds sensible and we shall try this type of approach.
>
> Many thanks,
> Matthew.
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Joerg Luedicke
> Sent: Friday, 19 October 2012 4:30 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: analysing experimental panel data
>
> On second thought, there might be a source of misunderstanding here.
> The OP stated that:
>
> "We presented participants with two mores set of prices in a randomized order"
>
> The way I read this is that all 2,000 study participants were confronted with both additional prices, and that the order in which these prices were presented to them was randomized.
>
> However, I can imagine that the OP meant to say that they drew a random sample of 1,000 individuals and assigned them to price a, and the other 1,000 to price b. If the latter is true, then my approach is of course useless here. In this case it looks like a straightforward difference-in-difference design to me, as the Econ folks would call it. That is, you would have a binary variable for treatment a/b and a binary variable for baseline/follow-up and then use these variables including their interaction to estimate the differences between baseline/follow-up for both treatment arms. If these differences are expected to vary across groups one would just need to include additional interaction effects.
>
> Joerg
>
> On Thu, Oct 18, 2012 at 11:12 AM, Joerg Luedicke <joerg.luedicke@gmail.com> wrote:
>> This is a quite general inquiry and there is probably a lot of wriggle
>> room in terms of how to analyze these data. There may also be a number
>> of details that could matter that don't show up in the post. However,
>> here is one possible approach.
>>
>> First of all, I would not call this 'panel data' in a sense that time
>> as such does not seem to play a role here. I would rather just call it
>> hierarchical data. Another thing is that this study probably does not
>> qualify as an experiment since there are no randomized
>> treatment/control groups (at least that is what I gather from the
>> post, so please correct me if I am wrong). So my first intuition here
>> would be to fit a multilevel model (aka mixed effects model) with a
>> bunch of interaction terms. I only consider varying intercepts here
>> but this could of course be extended to varying slopes as well. I also
>> only consider 3 groups here, for sake of simplicity.
>>
>> Let's start with generating some data:
>>
>> *-----------------------------------------------------
>> //2k individuals
>> clear
>> set seed 1234
>> set obs 2000
>> gen id=_n
>> gen ei=rnormal() //unit-specific error term
>>
>> //3 groups
>> gen p1=runiform()
>> gen group=cond(p1<.60, 1, cond(p < .80, 2, 3 )) label def gr 1"No
>> cannabis" 2"sometimes" 3"regularly"
>> label val group gr
>> qui tab group, g(group_)
>>
>> //Expanding to 3 observations each
>> expand 3
>> bys id: gen treat=_n
>> label def trt 1"base" 2"min price" 3"tax"
>> label val treat trt
>> qui tab treat, g(trt_)
>>
>> //Generating outcome (count of drinks at a Saturday night) //assuming
>> only non-cannabis users care about prices gen xb = 0.3 + 0.2*group_2 +
>> 0.4*group_3 - 0.2*trt_2 - 0.2*trt_3 ///
>> + 0.2*group_2*trt_2 + 0.2*group_3*trt_3 + 0.2*group_2*trt_3 +
>> 0.2*group_3*trt_2 ///
>> + ei
>> gen exp=exp(xb)
>> gen y=rpoisson(exp)
>> *-----------------------------------------------------
>>
>> In the above data generation we assume that people who consume
>> cannabis drink more than people who don't, and people who use it
>> regularly drink even more than people who just use it sometimes. We
>> further assume that people who do not use cannabis drink less when
>> prices increase, but cannabis users do not care about prices.
>>
>> We can then fit the model using a multilevel Poisson model:
>>
>> //Fitting a multilevel Poisson model
>> xtmepoisson y i.group##i.treat || id:
>>
>> And can obtain marginal counts for all treatment by cannabis groups:
>>
>> //Predicted counts using model fixed effects margins i.group##i.treat,
>> predict(fixedonly)
>>
>> after which we can compare differences in drinking amounts using
>> -test- (possibly with the -mtest- option if we do multiple
>> comparisons). However, these are not really marginal counts in the
>> sense that they are not population averaged counts because we
>> disregard the random error which stems from the variation of
>> differences in baseline drinking among the 2k individuals. Getting
>> 'real' population averaged effects here is not easy because we cant
>> just average over the random effects since the error is only normally
>> distributed with a mean of zero at the predictor scale, not the
>> outcome scale. However, an easy alternative would be to just fit a
>> marginal model:
>>
>> //Population averaged model
>> xtgee y i.group##i.treat, family(poisson) link(log) i(id) vce(robust)
>>
>> And again we can look at the marginal counts:
>>
>> //Marginal counts
>> margins i.group##i.treat
>>
>> and can do some testing, for example:
>>
>> //Testing the difference in #drinks between baseline and min-price
>> increase //for people who use cannabis sometimes vs. non-users
>> test    (_b[2.group#1bn.treat]-_b[2.group#2.treat]) =  ///
>> (_b[1bn.group#1bn.treat]-_b[1bn.group#2.treat])
>>
>> Depending on what you actually want to test it might be unnecessary to
>> go via -margins-. For example the above test is equivalent to the test
>> for the group_2#treat_2 interaction term in the model. However, it is
>> always a good idea to look at some model predictions to check whether
>> they actually make sense etc.
>>
>> Joerg
>>
>>
>>
>> On Wed, Oct 17, 2012 at 9:08 PM, Matthew Sunderland
>> <matthews@unsw.edu.au> wrote:
>>> Hi All
>>>
>>> I am seeking advice on how best to analyse data arising from an experiment. We surveyed 2,000 people asking them to hypothetically purchase and consume alcohol for an imaginary Saturday night.
>>>
>>> We collected data for three imaginary nights - First we presented participants with a set of alcohol prices reflecting current prices (baseline). We presented participants with two mores set of prices in a randomized order reflecting  price increase resulting from i) the establishment of a minimum price and ii) an increase in the rate of tax. Participants comprise six quotas, differentiated by gender and recent cannabis and ecstasy use.  Alcohol consumption is measured by the number of standard drinks, calculated by us from participant reports of how many items of alcohol they would consume eg glasses of wine, stubbies of beer etc. About 30% of the participants did not drink at baseline.
>>>
>>> We'd like to know: Do the two reforms have different impacts? Do people in different quotas respond differently to the reforms?  Do people with different levels of base-line drinking  respond differently to the reforms?
>>>
>>> One option we've thought of is for us to run two sets of fixed effects analysis (washes unobserved heterogeneity relating to alcohol consumption and quota membership)- using panel data for drinking at baseline and one of the reforms. Another option is for us to simply control for baseline consumption. We're thinking of running the analysis in two steps - a logit for whether or not someone drinks and an OLS regression for drinkers  -  log of standard drinks consumed, controlling for the predicted values coming from the logit.
>>>
>>> Thanks,
>>>
>>> Dr Matthew Sunderland
>>> Drug Policy Modelling Program,  National Drug and Alcohol Research
>>> Centre The University of New South Wales Sydney NSW AUSTRALIA 2052
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index