Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Michael Housman <mhousman@evolvondemand.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Using random-effects coefficients to predict performance over time |

Date |
Wed, 1 Aug 2012 16:45:20 +0000 |

Thanks for the advice! I've spent some time digging in on this front and I'm able to produce graphs that look the way I'd like. But I've run into another snag - the omitted group only seems to produce a straight line instead of a curve. Let me add a bit more detail: I have data in which each observation represents an employee-date and the dependent variable is a performance metric (e.g., average handle time, schedule adherence, hold time) for call center agents. In essence, I'm trying to model performance and plot the learning curve as a function of "day_of_service" for four different groups of employees. I've generated a variable called "hire_score_order" that's numbered 1 to 4, representing the four different groups that I want to represent. I've interacted that term twice with day_of_service so I can visually represent the first- and second-order effects. Here's my "xtreg" command and the resulting output for a sample metric: xtreg aht c.day_of_service##c.day_of_service##i.hire_score_order if hire_score_order ~= ., re Random-effects GLS regression Number of obs = 242792 Group variable: emp_id Number of groups = 1984 R-sq: within = 0.0049 Obs per group: min = 1 between = 0.1248 avg = 122.4 overall = 0.0622 max = 500 Wald chi2(38) = 1544.57 corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000 -------------------------------------------------------------------------------------------------------------------- aht | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------------------------------------------------+---------------------------------------------------------------- day_of_service | -.0035472 .0302398 -0.12 0.907 -.0628162 .0557218 | c.day_of_service#c.day_of_service | -9.38e-07 4.63e-06 -0.20 0.839 -.00001 8.13e-06 | hire_score_order | 2 | 168.1932 48.20808 3.49 0.000 73.70711 262.6793 3 | 20.51885 68.23659 0.30 0.764 -113.2224 154.2601 4 | 156.1946 109.0574 1.43 0.152 -57.55392 369.9431 | hire_score_order#c.day_of_service | 2 | -2.088015 .5027992 -4.15 0.000 -3.073483 -1.102546 3 | -1.117207 .4928079 -2.27 0.023 -2.083092 -.1513208 4 | -2.408916 1.294864 -1.86 0.063 -4.946802 .1289699 | hire_score_order#c.day_of_service#c.day_of_service | 2 | .0023866 .0016018 1.49 0.136 -.0007529 .0055262 3 | .0014925 .0014822 1.01 0.314 -.0014126 .0043976 4 | .0040321 .0037677 1.07 0.285 -.0033524 .0114167 | _cons | 246.4581 81.31057 3.03 0.002 87.09236 405.8239 ---------------------------------------------------+---------------------------------------------------------------- sigma_u | 521.47501 sigma_e | 930.20434 rho | .23912442 (fraction of variance due to u_i) -------------------------------------------------------------------------------------------------------------------- Then, I have STATA calculate the marginal effects for all 4 groups at specific values for "day_of_service". Here's my "margins" command and the resulting output: margins i.hire_score_order, at(day_of_service=(30(30)150)) post Predictive margins Number of obs = 242792 Model VCE : Conventional Expression : Linear prediction, predict() 1._at : day_of_ser~e = 30 2._at : day_of_ser~e = 60 3._at : day_of_ser~e = 90 4._at : day_of_ser~e = 120 5._at : day_of_ser~e = 150 -------------------------------------------------------------------------------------- | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] ---------------------+---------------------------------------------------------------- _at#hire_score_order | 1 1 | 619.9763 20.74882 29.88 0.000 579.3093 660.6432 1 2 | 727.677 40.27612 18.07 0.000 648.7373 806.6168 1 3 | 608.3222 62.3349 9.76 0.000 486.148 730.4964 1 4 | 707.5323 97.22201 7.28 0.000 516.9806 898.0839 2 1 | 619.8673 20.13757 30.78 0.000 580.3984 659.3362 2 2 | 671.3716 41.54202 16.16 0.000 589.9507 752.7924 2 3 | 578.7269 61.98285 9.34 0.000 457.2427 700.211 2 4 | 646.0426 98.66036 6.55 0.000 452.6719 839.4134 3 1 | 619.7567 19.55176 31.70 0.000 581.4359 658.0774 3 2 | 619.3604 44.3594 13.96 0.000 532.4176 706.3033 3 3 | 551.8164 63.02595 8.76 0.000 428.2878 675.345 3 4 | 591.8092 105.5965 5.60 0.000 384.8439 798.7744 4 1 | 619.6444 18.99285 32.63 0.000 582.4191 656.8697 4 2 | 571.6435 47.19628 12.11 0.000 479.1405 664.1465 4 3 | 527.5908 64.51581 8.18 0.000 401.1421 654.0394 4 4 | 544.8319 113.5318 4.80 0.000 322.3136 767.3502 5 1 | 619.5303 18.46233 33.56 0.000 583.3448 655.7159 5 2 | 528.2209 49.33645 10.71 0.000 431.5233 624.9186 5 3 | 506.05 65.87943 7.68 0.000 376.9287 635.1713 5 4 | 505.1108 120.0564 4.21 0.000 269.8045 740.417 -------------------------------------------------------------------------------------- Variables that uniquely identify margins: day_of_service hire_score_order Finally, I use the "marginsplot" command to plot those marginal effects and I use a number of different graphing options to make the graphs look presentable. Here's the location of some sample graphs that I've posted online for the 6 different performance metrics: http://assets.wharton.upenn.edu/~housman/graphs/abs_xt_cllcmn_hr_scr_Overall.png http://assets.wharton.upenn.edu/~housman/graphs/acw_xt_cllcmn_hr_scr_Overall.png http://assets.wharton.upenn.edu/~housman/graphs/adh_xt_cllcmn_hr_scr_Overall.png http://assets.wharton.upenn.edu/~housman/graphs/aht_xt_cllcmn_hr_scr_Overall.png http://assets.wharton.upenn.edu/~housman/graphs/hold_xt_cllcmn_hr_scr_Overall.png http://assets.wharton.upenn.edu/~housman/graphs/qa_xt_cllcmn_hr_scr_Overall.png I'm happy with the overall look and feel of the graphs but the problem is that the blue line - representing employees where hire_score_order = 1 - always appears to be a straight line. Is there any sort of specification I can use to tell STATA to represent the blue line as a curve like it does with the green, yellow, and red lines? Thanks again! Best, Michael -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of William Buchanan Sent: Thursday, June 28, 2012 3:58 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Using random-effects coefficients to predict performance over time Hi Michael, You don't need to create separate group variables. The factor operator i. tells Stata to expand the values into individual indicators. One problem that you've probably noticed is that you have several omitted terms in your model. Reducing the group variable to a single variable will solve that problem (since your syntax below would estimate the day of service variable and it's squared term twice). That could potentially be one issue with the results that you're finding. If you wanted to know more about this topic Michael Mitchell's recent book on visualizing data analysis models is a phenomenal resource. HTH, Billy Sent from my iPhone On Jun 28, 2012, at 15:45, Michael Housman <mhousman@evolvondemand.com> wrote: > Thanks so much! This is exactly what I was looking for. I've been playing with this functionality for a couple hours and have figured out how to plot the marginal effects. > > That said, I can't seem to figure out how to plot anything other than the linear prediction. Here's my code: > > xtreg adh controls c.day_of_service##c.day_of_service##i.Group1 > c.day_of_service##c.day_of_service##i.Group2 , re margins , > at(day_of_service = (30(10)90)) over(i.Group1 i.Group2) > > And here's what I see at the top of the marginal effects table: > > Expression : Linear prediction, predict() > > The lines in the marginal plot look straight. You'd mentioned "polynomial term interactions" in your response and I'm wondering if there's anything specific I need to specify in order to produce those. > > Thanks again! > > Best, > Michael > > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of William > Buchanan > Sent: Thursday, June 28, 2012 1:32 PM > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: Using random-effects coefficients to predict > performance over time > > Hi Michael, > > If you simplify your syntax: > > xtreg performance c.day_of_service##c.day_of_service##i.group > other_covariates, re > > You should be able to use the -margins- command, and subsequently -marginsplot-, to plot the relationship for each of the groups for the time variable and the polynomial term interactions. > > With regards to storing the coefficients, that is already done for you. Type -ereturn list- after fitting your model to find out what matrices your results are stored in (coefficients generally are stored in e(b)). With regards to the scaling of -x- you can estimate that with -margins- if you use the -at()- option with appropriate values for the variables you are interested in. > > HTH, > Billy > > > > > On Jun 28, 2012, at 1:21 PM, Michael Housman wrote: > >> Hi, >> >> Apologies if this is a novice question but I'm struggling with something and was wondering if the group can help out. >> >> I have data on employee performance over time and I'm trying to depict visually the relationship between employee performance and days of tenure on the job. My hypothesis is that there are 3 groups of employees and that these three groups vary in terms of: (1) where their performance starts off, (2) how quickly they learn, (3) how quickly their learning flattens out. In other words, I believe (and the data seems to indicate) that the first derivative of the learning curve is positive and the second derivative is negative. >> >> How I'd planned on approaching this problem was running a random-effects regression (because I have time-invariant explanatory variables) and interacting the group dummy variables with the linear and squared term representing days of service. For example, here's some of the code that I've set up (simplified slightly): >> >> * Generate the linear and squared terms >> >> gen day_of_service = metric_date - hire_date gen day_of_service2 = >> days_of_service ^ 2 >> >> * Interact those terms with the group dummy variables >> * I know this code can be simplified but I'm just using it here as an >> example >> >> gen Group1_dos = Group1 * day_of_service gen Group1_dos2 = Group1 * >> day_of_service2 >> >> gen Group2_dos = Group2 * day_of_service gen Group2_dos2 = Group2 * >> day_of_service2 >> >> gen Group3_dos = Group3 * day_of_service gen Group3_dos2 = Group3 * >> day_of_service2 >> >> * Run random-effects regression (Group 3 is my omitted group) >> >> xtreg performance day_of_service day_of_service2 Group1 Group1_dos >> Group1_dos2 Group2 Group2_dos Group2_dos2 other_covariates, re >> >> So this run a random-effects regression where I model the: (1) intercept, (2) slope, and (3) squared term for these groups (relatively to the one omitted group - Group3) and generate a coefficient representing each. That part I understand. >> >> What I don't understand is how I can depict this visually. In other words, I'd like to ask STATA to generate 3 separate curves from these coefficient estimates and then plot the lines on a graph. Here's how I imagine that equation would like for each of the 3 groups: >> >> Group1: f(t) = day_of_service*t + day_of_service2*t2 + Group1 + >> Group1_dos*t + Group1_dos*t2 >> Group2: f(t) = day_of_service*t + day_of_service2*t2 + Group2 + >> Group2_dos*t + Group2_dos*t2 >> Group3: f(t) = day_of_service*t + day_of_service2*t2 >> >> So the performance on any given day would be indicated by f(x) where x represents the agent's days of tenure on the job. I understand this much. Here's what I don't understand: >> >> 1) How can I get STATA to save the coefficients in a data matrix that it keeps in memory? >> 2) How can I get STATA to then generate a dataset where x runs from, say, 0 to 180 days and then calculates f(t) for these three groups? >> 3) How can I get STATA to depict this visually (most likely as a line graph)? >> >> Truth be told, piece (3) is something I understand but pieces (1) and (2) are the ones I don't get. Thanks in advance for any help! >> >> Best, >> Mike Housman >> >> >> >> >> >> >> >> >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**RE: st: differentiating between groups of records with same date** - Next by Date:
**st: RE: saving graph in .emf format** - Previous by thread:
**st: do-file arguments acting weird** - Next by thread:
**st: RE: RE: RE: hausman, augmented test from Vince's code and xtoverid after xtivreg** - Index(es):