Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Testing for significant differences between groups after running a random-effects regression


From   Joerg Luedicke <joerg.luedicke@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Testing for significant differences between groups after running a random-effects regression
Date   Tue, 9 Oct 2012 14:25:02 -0500

Michael Housman asked:
"Was wondering if anyone could tell me how to test for significant
differences between groups after running a random-effects regression?"

This can be tricky in such a set-up. What you are doing is fitting
quadratic slopes that can vary across different groups. Technically,
it means that you are estimating two shape parameters for each
category. The question is now what does "significant differences
between groups" mean? I can think of two different things here: (a)
that the _shapes_ of the curves differ across groups, and (b) that the
expected means are significantly different across groups. (a) is
basically what the significance test from the model is referring to.
If we consider the following example:

*---------------------------------------------------------
use http://www.stata-press.com/data/r11/nlswork.dta, clear
drop if race==3
xtmixed ln_wage c.tenure##c.tenure##i.race || idcode:, mle
*---------------------------------------------------------

we can fit quadratic slopes for wage as a function of tenure, by
race/ethnicity (I did not do any data checking here but I assume that
you checked your data and found that quadratic slopes give a
reasonable representation of your data?). Testing the shape parameters
for the two race groups against each other yields:

*---------------------------------------------------------
test (_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure]) = _b[ln_wage:tenure]
test (_b[ln_wage:c.tenure#c.tenure]+_b[ln_wage:2.race#c.tenure#c.tenure]) = ///
_b[ln_wage:c.tenure#c.tenure]
*---------------------------------------------------------

which is equivalent to the test from the model output itself.
Different contrasts in case of more than 2 groups can be obtained by
either recoding the group variable or using -test-. However, unless
one has a very specific hypothesis in mind about differences between
groups with regard to the actual _shape_ of the fitted curves, this
test is of limited usefulness. Rather, what we are usually interested
in are differences in expected means across time. If we had just
fitted linear slopes (or just one quadratic slope, i.e. without
interaction), we could simply look at the differences of the
intercepts of the slopes which would be the same across the entire
range of time. But given that we fitted quadratic slopes for each
group, there is no single difference in expected means because the
difference varies over the range of time, i.e. the difference in
expected means is a different one at each point in time. What one
could do know is to do some testing at selected points in time. For
example, if we wanted to check whether Whites are earning
significantly more than Blacks at 10 years of job tenure, we could
type:

*---------------------------------------------------------
lincom (_b[ln_wage:_cons]+_b[ln_wage:tenure]*10 + ///
_b[ln_wage:c.tenure#c.tenure]*100) - ///
(_b[ln_wage:_cons]+_b[ln_wage:2.race] + ///
(_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure])*10 + ///
(_b[ln_wage:c.tenure#c.tenure] + ///
_b[ln_wage:2.race#c.tenure#c.tenure])*100)
*---------------------------------------------------------

That is, we predict the expected mean for Whites:

*---------------------------------------------------------
lincom _b[ln_wage:_cons]+_b[ln_wage:tenure]*10 + ///
_b[ln_wage:c.tenure#c.tenure]*100
*---------------------------------------------------------

and for Blacks

*---------------------------------------------------------
lincom _b[ln_wage:_cons]+_b[ln_wage:2.race] + ///
(_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure])*10 + ///
(_b[ln_wage:c.tenure#c.tenure] + ///
_b[ln_wage:2.race#c.tenure#c.tenure])*100
*---------------------------------------------------------

and then calculate the difference to check whether it is zero. We
could have obtained the same result easier by typing:

*---------------------------------------------------------
margins, dydx(race) at(tenure=(10))
*---------------------------------------------------------

However, this approach is limited to a handful of differences at some
discrete points in time. What I would rather do is simply plotting the
fitted slopes and put confidence bands around it, so everybody can
quickly see the differences across the entire time range:

*---------------------------------------------------------
use http://www.stata-press.com/data/r11/nlswork.dta, clear

drop if race==3

xtmixed ln_wage c.tenure##c.tenure##i.race || idcode:, mle
				
predict h_wage			
predict se_wage, stdp				
				
gen cilo=h_wage-2*se_wage				
gen cihi=h_wage+2*se_wage					
				
twoway rarea cilo cihi tenure if race==1, sort color(gs8) fint(50)  ///
	|| line h_wage tenure if race==1, sort lcolor(green)  			///
	|| rarea cilo cihi tenure if race==2, sort color(gs8) fint(50)  ///
	|| line h_wage tenure if race==2, sort lcolor(red)  			///
	xlabel(0(5)25) 					///
	xtitle("Job tenure (years)")  	///
	ytitle("log(wage)") 		 	///
	ylabel(0(1)3, angle(0)) 		///
	legend(order(2 "White" 4 "Black") rows(1)) 				
*---------------------------------------------------------

Joerg

P.S. I used -xtmixed- here because it can easily be extended to
letting the time effect vary across subjects. In the model above, and
in your -xtreg- model, these effects are constrained to be the same
across clusters, and it often makes sense to relax this assumption.


On Tue, Oct 9, 2012 at 10:57 AM, Michael Housman
<mhousman@evolvondemand.com> wrote:
> Hi folks,
>
> Was wondering if anyone could tell me how to test for significant differences between groups after running a random-effects regression?
>
> By way of background, I have data in which each observation represents an employee-date and the dependent variable is a performance metric (e.g., average handle time, customer satisfaction, etc.) for call center agents.  In essence, I'm trying to model performance and plot the learning curve as a function of "day_of_service" for four different groups of employees.
>
> I've generated a variable called "hire_score_order" that's numbered 1 to 4, representing the four different groups that I want to represent.  I've interacted that term twice with day_of_service so I can visually represent the first- and second-order effects.  I've copied below my "xtreg" command and the resulting output for a sample metric.
>
> What I want to do is run xtreg post-estimation to test the hypothesis that group 1's learning curve is significantly different than groups 2's, group 2's vs. group 3's, etc.  Any suggestions?
>
> Thanks in advance!
>
> Best,
> Mike
>
>
>
> xtreg aht c.day_of_service##c.day_of_service##i.hire_score_order, re
>
> Random-effects GLS regression                   Number of obs      =    242792
> Group variable: emp_id                          Number of groups   =      1984
>
> R-sq:  within  = 0.0049                         Obs per group: min =         1
>        between = 0.1248                                        avg =     122.4
>        overall = 0.0622                                        max =       500
>
>                                                 Wald chi2(38)      =   1544.57
> corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000
>
> --------------------------------------------------------------------------------------------------------------------
>                                                aht |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
> ---------------------------------------------------+----------------------------------------------------------------
>                                     day_of_service |  -.0035472   .0302398    -0.12   0.907    -.0628162    .0557218
>                                                    |
>                  c.day_of_service#c.day_of_service |  -9.38e-07   4.63e-06    -0.20   0.839      -.00001    8.13e-06
>                                                    |
>                                   hire_score_order |
>                                                 2  |   168.1932   48.20808     3.49   0.000     73.70711    262.6793
>                                                 3  |   20.51885   68.23659     0.30   0.764    -113.2224    154.2601
>                                                 4  |   156.1946   109.0574     1.43   0.152    -57.55392    369.9431
>                                                    |
>                  hire_score_order#c.day_of_service |
>                                                 2  |  -2.088015   .5027992    -4.15   0.000    -3.073483   -1.102546
>                                                 3  |  -1.117207   .4928079    -2.27   0.023    -2.083092   -.1513208
>                                                 4  |  -2.408916   1.294864    -1.86   0.063    -4.946802    .1289699
> hire_score_order#c.day_of_service#c.day_of_service |
>                                                 2  |   .0023866   .0016018     1.49   0.136    -.0007529    .0055262
>                                                 3  |   .0014925   .0014822     1.01   0.314    -.0014126    .0043976
>                                                 4  |   .0040321   .0037677     1.07   0.285    -.0033524    .0114167
>                                                    |
>                                              _cons |   246.4581   81.31057     3.03   0.002     87.09236    405.8239
> ---------------------------------------------------+----------------------------------------------------------------
>                                            sigma_u |  521.47501
>                                            sigma_e |  930.20434
>                                                rho |  .23912442   (fraction of variance due to u_i)
> --------------------------------------------------------------------------------------------------------------------
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index