Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Michael Housman <mhousman@evolvondemand.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Testing for significant differences between groups after running a random-effects regression |

Date |
Wed, 10 Oct 2012 03:12:47 +0000 |

Thanks very much for your help! This makes a lot of sense to me. You're absolutely right - it's not entirely clear how to test for differences between two groups when they have different intercepts, slopes, curvatures, etc. For what it's worth, I already use the "margins" command after running my xtreg to plot the margins for these groups and then the "marginsplot" command to display the performance curves visually along with confidence intervals. So it's possible to see whether each line is inside / outside every other's confidence interval. I'd hoped that there might be some sort of function like "sts test" where STATA will test tests the equality of survivor functions across two or more groups and produces a p-value indicating whether the curves are significantly different from one another. Please let me know if you're aware of anything similar that can be done when plotting marginal effects. Thanks again! Best, Mike -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Joerg Luedicke Sent: Tuesday, October 09, 2012 12:25 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Testing for significant differences between groups after running a random-effects regression Michael Housman asked: "Was wondering if anyone could tell me how to test for significant differences between groups after running a random-effects regression?" This can be tricky in such a set-up. What you are doing is fitting quadratic slopes that can vary across different groups. Technically, it means that you are estimating two shape parameters for each category. The question is now what does "significant differences between groups" mean? I can think of two different things here: (a) that the _shapes_ of the curves differ across groups, and (b) that the expected means are significantly different across groups. (a) is basically what the significance test from the model is referring to. If we consider the following example: *--------------------------------------------------------- use http://www.stata-press.com/data/r11/nlswork.dta, clear drop if race==3 xtmixed ln_wage c.tenure##c.tenure##i.race || idcode:, mle *--------------------------------------------------------- we can fit quadratic slopes for wage as a function of tenure, by race/ethnicity (I did not do any data checking here but I assume that you checked your data and found that quadratic slopes give a reasonable representation of your data?). Testing the shape parameters for the two race groups against each other yields: *--------------------------------------------------------- test (_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure]) = _b[ln_wage:tenure] test (_b[ln_wage:c.tenure#c.tenure]+_b[ln_wage:2.race#c.tenure#c.tenure]) = /// _b[ln_wage:c.tenure#c.tenure] *--------------------------------------------------------- which is equivalent to the test from the model output itself. Different contrasts in case of more than 2 groups can be obtained by either recoding the group variable or using -test-. However, unless one has a very specific hypothesis in mind about differences between groups with regard to the actual _shape_ of the fitted curves, this test is of limited usefulness. Rather, what we are usually interested in are differences in expected means across time. If we had just fitted linear slopes (or just one quadratic slope, i.e. without interaction), we could simply look at the differences of the intercepts of the slopes which would be the same across the entire range of time. But given that we fitted quadratic slopes for each group, there is no single difference in expected means because the difference varies over the range of time, i.e. the difference in expected means is a different one at each point in time. What one could do know is to do some testing at selected points in time. For example, if we wanted to check whether Whites are earnin! g significantly more than Blacks at 10 years of job tenure, we could type: *--------------------------------------------------------- lincom (_b[ln_wage:_cons]+_b[ln_wage:tenure]*10 + /// _b[ln_wage:c.tenure#c.tenure]*100) - /// (_b[ln_wage:_cons]+_b[ln_wage:2.race] + /// (_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure])*10 + /// (_b[ln_wage:c.tenure#c.tenure] + /// _b[ln_wage:2.race#c.tenure#c.tenure])*100) *--------------------------------------------------------- That is, we predict the expected mean for Whites: *--------------------------------------------------------- lincom _b[ln_wage:_cons]+_b[ln_wage:tenure]*10 + /// _b[ln_wage:c.tenure#c.tenure]*100 *--------------------------------------------------------- and for Blacks *--------------------------------------------------------- lincom _b[ln_wage:_cons]+_b[ln_wage:2.race] + /// (_b[ln_wage:tenure]+_b[ln_wage:2.race#c.tenure])*10 + /// (_b[ln_wage:c.tenure#c.tenure] + /// _b[ln_wage:2.race#c.tenure#c.tenure])*100 *--------------------------------------------------------- and then calculate the difference to check whether it is zero. We could have obtained the same result easier by typing: *--------------------------------------------------------- margins, dydx(race) at(tenure=(10)) *--------------------------------------------------------- However, this approach is limited to a handful of differences at some discrete points in time. What I would rather do is simply plotting the fitted slopes and put confidence bands around it, so everybody can quickly see the differences across the entire time range: *--------------------------------------------------------- use http://www.stata-press.com/data/r11/nlswork.dta, clear drop if race==3 xtmixed ln_wage c.tenure##c.tenure##i.race || idcode:, mle predict h_wage predict se_wage, stdp gen cilo=h_wage-2*se_wage gen cihi=h_wage+2*se_wage twoway rarea cilo cihi tenure if race==1, sort color(gs8) fint(50) /// || line h_wage tenure if race==1, sort lcolor(green) /// || rarea cilo cihi tenure if race==2, sort color(gs8) fint(50) /// || line h_wage tenure if race==2, sort lcolor(red) /// xlabel(0(5)25) /// xtitle("Job tenure (years)") /// ytitle("log(wage)") /// ylabel(0(1)3, angle(0)) /// legend(order(2 "White" 4 "Black") rows(1)) *--------------------------------------------------------- Joerg P.S. I used -xtmixed- here because it can easily be extended to letting the time effect vary across subjects. In the model above, and in your -xtreg- model, these effects are constrained to be the same across clusters, and it often makes sense to relax this assumption. On Tue, Oct 9, 2012 at 10:57 AM, Michael Housman <mhousman@evolvondemand.com> wrote: > Hi folks, > > Was wondering if anyone could tell me how to test for significant differences between groups after running a random-effects regression? > > By way of background, I have data in which each observation represents an employee-date and the dependent variable is a performance metric (e.g., average handle time, customer satisfaction, etc.) for call center agents. In essence, I'm trying to model performance and plot the learning curve as a function of "day_of_service" for four different groups of employees. > > I've generated a variable called "hire_score_order" that's numbered 1 to 4, representing the four different groups that I want to represent. I've interacted that term twice with day_of_service so I can visually represent the first- and second-order effects. I've copied below my "xtreg" command and the resulting output for a sample metric. > > What I want to do is run xtreg post-estimation to test the hypothesis that group 1's learning curve is significantly different than groups 2's, group 2's vs. group 3's, etc. Any suggestions? > > Thanks in advance! > > Best, > Mike > > > > xtreg aht c.day_of_service##c.day_of_service##i.hire_score_order, re > > Random-effects GLS regression Number of obs = 242792 > Group variable: emp_id Number of groups = 1984 > > R-sq: within = 0.0049 Obs per group: min = 1 > between = 0.1248 avg = 122.4 > overall = 0.0622 max = 500 > > Wald chi2(38) = 1544.57 > corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 > > -------------------------------------------------------------------------------------------------------------------- > aht | Coef. Std. Err. z P>|z| [95% Conf. Interval] > ---------------------------------------------------+------------------ > ---------------------------------------------------+------------------ > ---------------------------------------------------+------------------ > ---------------------------------------------------+---------- > day_of_service | -.0035472 .0302398 -0.12 0.907 -.0628162 .0557218 > | > c.day_of_service#c.day_of_service | -9.38e-07 4.63e-06 -0.20 0.839 -.00001 8.13e-06 > | > hire_score_order | > 2 | 168.1932 48.20808 3.49 0.000 73.70711 262.6793 > 3 | 20.51885 68.23659 0.30 0.764 -113.2224 154.2601 > 4 | 156.1946 109.0574 1.43 0.152 -57.55392 369.9431 > | > hire_score_order#c.day_of_service | > 2 | -2.088015 .5027992 -4.15 0.000 -3.073483 -1.102546 > 3 | -1.117207 .4928079 -2.27 0.023 -2.083092 -.1513208 > 4 | -2.408916 1.294864 -1.86 0.063 -4.946802 .1289699 > hire_score_order#c.day_of_service#c.day_of_service | > 2 | .0023866 .0016018 1.49 0.136 -.0007529 .0055262 > 3 | .0014925 .0014822 1.01 0.314 -.0014126 .0043976 > 4 | .0040321 .0037677 1.07 0.285 -.0033524 .0114167 > | > _cons | 246.4581 81.31057 3.03 0.002 87.09236 405.8239 > ---------------------------------------------------+------------------ > ---------------------------------------------------+------------------ > ---------------------------------------------------+------------------ > ---------------------------------------------------+---------- > sigma_u | 521.47501 > sigma_e | 930.20434 > rho | .23912442 (fraction of variance due to u_i) > ---------------------------------------------------------------------- > ---------------------------------------------- > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Testing for significant differences between groups after running a random-effects regression***From:*Michael Housman <mhousman@evolvondemand.com>

**Re: st: Testing for significant differences between groups after running a random-effects regression***From:*Joerg Luedicke <joerg.luedicke@gmail.com>

- Prev by Date:
**st: How to identify a string as a variable?** - Next by Date:
**RE: st: How to identify a string as a variable?** - Previous by thread:
**Re: st: Testing for significant differences between groups after running a random-effects regression** - Next by thread:
**st: levpet - loops and varying results** - Index(es):