Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Using random-effects coefficients to predict performance over time


From   Michael Housman <mhousman@evolvondemand.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Using random-effects coefficients to predict performance over time
Date   Wed, 1 Aug 2012 16:45:20 +0000

Thanks for the advice!  I've spent some time digging in on this front and I'm able to produce graphs that look the way I'd like.  But I've run into another snag - the omitted group only seems to produce a straight line instead of a curve.  Let me add a bit more detail:

I have data in which each observation represents an employee-date and the dependent variable is a performance metric (e.g., average handle time, schedule adherence, hold time) for call center agents.  In essence, I'm trying to model performance and plot the learning curve as a function of "day_of_service" for four different groups of employees.

I've generated a variable called "hire_score_order" that's numbered 1 to 4, representing the four different groups that I want to represent.  I've interacted that term twice with day_of_service so I can visually represent the first- and second-order effects.  Here's my "xtreg" command and the resulting output for a sample metric:

xtreg aht c.day_of_service##c.day_of_service##i.hire_score_order if hire_score_order ~= ., re

Random-effects GLS regression                   Number of obs      =    242792
Group variable: emp_id                          Number of groups   =      1984

R-sq:  within  = 0.0049                         Obs per group: min =         1
       between = 0.1248                                        avg =     122.4
       overall = 0.0622                                        max =       500

                                                Wald chi2(38)      =   1544.57
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0000

--------------------------------------------------------------------------------------------------------------------
                                               aht |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------------------------------------+----------------------------------------------------------------
                                    day_of_service |  -.0035472   .0302398    -0.12   0.907    -.0628162    .0557218
                                                   |
                 c.day_of_service#c.day_of_service |  -9.38e-07   4.63e-06    -0.20   0.839      -.00001    8.13e-06
                                                   |
                                  hire_score_order |
                                                2  |   168.1932   48.20808     3.49   0.000     73.70711    262.6793
                                                3  |   20.51885   68.23659     0.30   0.764    -113.2224    154.2601
                                                4  |   156.1946   109.0574     1.43   0.152    -57.55392    369.9431
                                                   |
                 hire_score_order#c.day_of_service |
                                                2  |  -2.088015   .5027992    -4.15   0.000    -3.073483   -1.102546
                                                3  |  -1.117207   .4928079    -2.27   0.023    -2.083092   -.1513208
                                                4  |  -2.408916   1.294864    -1.86   0.063    -4.946802    .1289699
                                                   |
hire_score_order#c.day_of_service#c.day_of_service |
                                                2  |   .0023866   .0016018     1.49   0.136    -.0007529    .0055262
                                                3  |   .0014925   .0014822     1.01   0.314    -.0014126    .0043976
                                                4  |   .0040321   .0037677     1.07   0.285    -.0033524    .0114167
                                                   |
                                             _cons |   246.4581   81.31057     3.03   0.002     87.09236    405.8239
---------------------------------------------------+----------------------------------------------------------------
                                           sigma_u |  521.47501
                                           sigma_e |  930.20434
                                               rho |  .23912442   (fraction of variance due to u_i)
--------------------------------------------------------------------------------------------------------------------

Then, I have STATA calculate the marginal effects for all 4 groups at specific values for "day_of_service".  Here's my "margins" command and the resulting output:

margins i.hire_score_order, at(day_of_service=(30(30)150)) post

Predictive margins                                Number of obs   =     242792
Model VCE    : Conventional

Expression   : Linear prediction, predict()

1._at        : day_of_ser~e    =          30

2._at        : day_of_ser~e    =          60

3._at        : day_of_ser~e    =          90

4._at        : day_of_ser~e    =         120

5._at        : day_of_ser~e    =         150

--------------------------------------------------------------------------------------
                     |            Delta-method
                     |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------------+----------------------------------------------------------------
_at#hire_score_order |
                1 1  |   619.9763   20.74882    29.88   0.000     579.3093    660.6432
                1 2  |    727.677   40.27612    18.07   0.000     648.7373    806.6168
                1 3  |   608.3222    62.3349     9.76   0.000      486.148    730.4964
                1 4  |   707.5323   97.22201     7.28   0.000     516.9806    898.0839
                2 1  |   619.8673   20.13757    30.78   0.000     580.3984    659.3362
                2 2  |   671.3716   41.54202    16.16   0.000     589.9507    752.7924
                2 3  |   578.7269   61.98285     9.34   0.000     457.2427     700.211
                2 4  |   646.0426   98.66036     6.55   0.000     452.6719    839.4134
                3 1  |   619.7567   19.55176    31.70   0.000     581.4359    658.0774
                3 2  |   619.3604    44.3594    13.96   0.000     532.4176    706.3033
                3 3  |   551.8164   63.02595     8.76   0.000     428.2878     675.345
                3 4  |   591.8092   105.5965     5.60   0.000     384.8439    798.7744
                4 1  |   619.6444   18.99285    32.63   0.000     582.4191    656.8697
                4 2  |   571.6435   47.19628    12.11   0.000     479.1405    664.1465
                4 3  |   527.5908   64.51581     8.18   0.000     401.1421    654.0394
                4 4  |   544.8319   113.5318     4.80   0.000     322.3136    767.3502
                5 1  |   619.5303   18.46233    33.56   0.000     583.3448    655.7159
                5 2  |   528.2209   49.33645    10.71   0.000     431.5233    624.9186
                5 3  |     506.05   65.87943     7.68   0.000     376.9287    635.1713
                5 4  |   505.1108   120.0564     4.21   0.000     269.8045     740.417
--------------------------------------------------------------------------------------

  Variables that uniquely identify margins: day_of_service hire_score_order

Finally, I use the "marginsplot" command to plot those marginal effects and I use a number of different graphing options to make the graphs look presentable.  Here's the location of some sample graphs that I've posted online for the 6 different performance metrics:

http://assets.wharton.upenn.edu/~housman/graphs/abs_xt_cllcmn_hr_scr_Overall.png 
http://assets.wharton.upenn.edu/~housman/graphs/acw_xt_cllcmn_hr_scr_Overall.png 
http://assets.wharton.upenn.edu/~housman/graphs/adh_xt_cllcmn_hr_scr_Overall.png 
http://assets.wharton.upenn.edu/~housman/graphs/aht_xt_cllcmn_hr_scr_Overall.png 
http://assets.wharton.upenn.edu/~housman/graphs/hold_xt_cllcmn_hr_scr_Overall.png 
http://assets.wharton.upenn.edu/~housman/graphs/qa_xt_cllcmn_hr_scr_Overall.png

I'm happy with the overall look and feel of the graphs but the problem is that the blue line - representing employees where hire_score_order = 1 - always appears to be a straight line.  Is there any sort of specification I can use to tell STATA to represent the blue line as a curve like it does with the green, yellow, and red lines?

Thanks again!

Best,
Michael



-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of William Buchanan
Sent: Thursday, June 28, 2012 3:58 PM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Using random-effects coefficients to predict performance over time

Hi Michael,

You don't need to create separate group variables.  The factor operator i. tells Stata to expand the values into individual indicators.  One problem that you've probably noticed is that you have several omitted terms in your model.  Reducing the group variable to a single variable will solve that problem (since your syntax below would estimate the day of service variable and it's squared term twice).  

That could potentially be one issue with the results that you're finding.  If you wanted to know more about this topic Michael Mitchell's recent book on visualizing data analysis models is a phenomenal resource.

HTH,
Billy

Sent from my iPhone

On Jun 28, 2012, at 15:45, Michael Housman <mhousman@evolvondemand.com> wrote:

> Thanks so much!  This is exactly what I was looking for.  I've been playing with this functionality for a couple hours and have figured out how to plot the marginal effects.
> 
> That said, I can't seem to figure out how to plot anything other than the linear prediction.  Here's my code:
> 
> xtreg adh controls c.day_of_service##c.day_of_service##i.Group1 
> c.day_of_service##c.day_of_service##i.Group2 , re margins , 
> at(day_of_service = (30(10)90)) over(i.Group1 i.Group2)
> 
> And here's what I see at the top of the marginal effects table:
> 
> Expression   : Linear prediction, predict()
> 
> The lines in the marginal plot look straight.  You'd mentioned "polynomial term interactions" in your response and I'm wondering if there's anything specific I need to specify in order to produce those.
> 
> Thanks again!
> 
> Best,
> Michael
> 
> 
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu 
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of William 
> Buchanan
> Sent: Thursday, June 28, 2012 1:32 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: Using random-effects coefficients to predict 
> performance over time
> 
> Hi Michael,
> 
> If you simplify your syntax:
> 
> xtreg performance c.day_of_service##c.day_of_service##i.group 
> other_covariates, re
> 
> You should be able to use the -margins- command, and subsequently -marginsplot-, to plot the relationship for each of the groups for the time variable and the polynomial term interactions.  
> 
> With regards to storing the coefficients, that is already done for you.  Type -ereturn list- after fitting your model to find out what matrices your results are stored in (coefficients generally are stored in e(b)).  With regards to the scaling of -x- you can estimate that with -margins- if you use the -at()- option with appropriate values for the variables you are interested in.
> 
> HTH,
> Billy
> 
> 
> 
> 
> On Jun 28, 2012, at 1:21 PM, Michael Housman wrote:
> 
>> Hi,
>> 
>> Apologies if this is a novice question but I'm struggling with something and was wondering if the group can help out.
>> 
>> I have data on employee performance over time and I'm trying to depict visually the relationship between employee performance and days of tenure on the job.  My hypothesis is that there are 3 groups of employees and that these three groups vary in terms of: (1) where their performance starts off, (2) how quickly they learn, (3) how quickly their learning flattens out.  In other words, I believe (and the data seems to indicate) that the first derivative of the learning curve is positive and the second derivative is negative.
>> 
>> How I'd planned on approaching this problem was running a random-effects regression (because I have time-invariant explanatory variables) and interacting the group dummy variables with the linear and squared term representing days of service.  For example, here's some of the code that I've set up (simplified slightly):
>> 
>> * Generate the linear and squared terms
>> 
>> gen day_of_service = metric_date - hire_date gen day_of_service2 = 
>> days_of_service ^ 2
>> 
>> * Interact those terms with the group dummy variables
>> * I know this code can be simplified but I'm just using it here as an 
>> example
>> 
>> gen Group1_dos = Group1 * day_of_service gen Group1_dos2 = Group1 *
>> day_of_service2
>> 
>> gen Group2_dos = Group2 * day_of_service gen Group2_dos2 = Group2 *
>> day_of_service2
>> 
>> gen Group3_dos = Group3 * day_of_service gen Group3_dos2 = Group3 *
>> day_of_service2
>> 
>> * Run random-effects regression (Group 3 is my omitted group)
>> 
>> xtreg performance day_of_service day_of_service2 Group1 Group1_dos
>> Group1_dos2 Group2 Group2_dos Group2_dos2 other_covariates, re
>> 
>> So this run a random-effects regression where I model the: (1) intercept, (2) slope, and (3) squared term for these groups (relatively to the one omitted group - Group3) and generate a coefficient representing each.  That part I understand.
>> 
>> What I don't understand is how I can depict this visually.  In other words, I'd like to ask STATA to generate 3 separate curves from these coefficient estimates and then plot the lines on a graph.  Here's how I imagine that equation would like for each of the 3 groups:
>> 
>> Group1: f(t) = day_of_service*t + day_of_service2*t2 + Group1 + 
>> Group1_dos*t + Group1_dos*t2
>> Group2: f(t) = day_of_service*t + day_of_service2*t2 + Group2 + 
>> Group2_dos*t + Group2_dos*t2
>> Group3: f(t) = day_of_service*t + day_of_service2*t2
>> 
>> So the performance on any given day would be indicated by f(x) where x represents the agent's days of tenure on the job.  I understand this much.  Here's what I don't understand:
>> 
>> 1) How can I get STATA to save the coefficients in a data matrix that it keeps in memory?
>> 2) How can I get STATA to then generate a dataset where x runs from, say, 0 to 180 days and then calculates f(t) for these three groups?
>> 3) How can I get STATA to depict this visually (most likely as a line graph)?
>> 
>> Truth be told, piece (3) is something I understand but pieces (1) and (2) are the ones I don't get.  Thanks in advance for any help!
>> 
>> Best,
>> Mike Housman
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index