László Sándor <sandorl@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: xt: unit-specific trends

Thu, 19 Apr 2012 09:29:59 -0400

Quick comments on this: I forgot to flag that the residual variable need to exist beforehand for -genbump- below, this is only replacing values of it. More importantly: The operation is still far, far from linear in the number of individuals (N in the panel — T is fixed). I could again finish a 1% subsample in around 10 minutes or so, but my bold attempt at 10% overnight still only finished 4 out of the 8 variables to be transformed this way in 10 or 11 hours. Maybe caching and memory is an issue here, but if anybody (StataCorp?) had a comment on this otherwise, that would be helpful. Maybe firing up _regress and _predict all the time is very costly? Or the marksample is not fast enough with the by option? (Does the code know that once it finished with seven consecutive rows there is nothing to check further below "whether" `touse' is 1 anywhere else? I guessed byable commands produce efficient subscripting for some underlying Mata code…) Or even the byable command does not use MP resources efficiently? (Still, even remaining serial, the speed-up could be much closer to linear, no?) I thought individual-specific trends are almost as trendy nowadays as fixed-effects — I wonder if they could be done much faster. Thanks, Laszlo 2012/4/18 László Sándor <sandorl@gmail.com>: > In case anyone cares, this is what I came up with. (Detrends, demeans, > and also allows for a level shift.) And this is faster, as I expected. > > program define genbump, byable(recall, noheader) > version 11 > syntax =/exp [if] [in], trend(varname) bump(varname) resid(varname) > marksample touse, novarlist > tempvar res > quietly { > _regress `exp' `trend' `bump' if `touse' > _predict `res', resid > replace `resid' = `res'+_b[`bump']*`bump' if `touse' > } > end > > > 2012/4/18 László Sándor <sandorl@gmail.com>: >> Thanks, Nick, >> >> I left out a crucial part: I need to run it for observations in the >> 10K magnitude (full sample: 400K, but I also try to sample down). >> >> I just had the 200 / 4 mins as a measure of speed. >> >> I would really love to see this speed up. >> >> So I should make the residual-generation a separate command, and make >> it byable (but no egen), then? Any other trick up your sleeve? >> >> Gratefully, as always, >> >> Laszlo >> >> On Wed, Apr 18, 2012 at 7:56 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> If a total task takes 3-4 minutes, dots to show progress are >>> pointless, in my view. >>> >>> -egen- is for convenience. Writing -egen- will not speed up; it will >>> just slow things down. Nick >>> >>> 2012/4/19 László Sándor <sandorl@gmail.com>: >>>> Or a quick idea: Shall I write an -egen- extension instead? Or all >>>> benefits would come from its byability anyway? >>>> >>>> 2012/4/18 László Sándor <sandorl@gmail.com>: >>>>> Let me get back to this now that I know how fast I am doing using -_dots-. >>>>> >>>>> Now I know it takes 3-4 minutes to loop through 200 cases while all I >>>>> do each time is a trivial regression on 4-7 observations and >>>>> predicting the residuals. >>>>> >>>>> I would greatly welcome suggestions on how to speed this up relative >>>>> to the code below. Most likely checking all cases for the -if- >>>>> condition when only few would satisfy and they could come in blocks >>>>> after a single sort could help things but I am out of ideas how to do >>>>> that. Making the code "byable" would at least use some features of MP? >>>>> >>>>> Thanks! >>>>> >>>>> Laszlo >>>>> >>>>> sum nid, d >>>>> _dots 0 >>>>> forval i = 1/`r(max)' { >>>>> foreach v of varlist assets liabs netassets koejd { >>>>> cap reg `v' year post if nid == `i' >>>>> if _rc == 0 { >>>>> predict resid, resid >>>>> qui replace r`v' = resid + _b[post]*post if e(sample) >>>>> drop resid >>>>> } >>>>> } >>>>> _dots `i' 0 >>>>> } >>>>> >>>>> 2012/4/13 László Sándor <sandorl@gmail.com>: >>>>>> Hi all, >>>>>> >>>>>> I am trying to demean and detrend my panel data allowing for unit >>>>>> specific trends (using Stata 11.0 MP for Windows). I found some >>>>>> previous posts about this, but I am not satisfied with the speed of >>>>>> the solutions. I would be most happy with a "byable" solution, like >>>>>> this pseudocode: >>>>>> >>>>>> bys id: { >>>>>> reg var t >>>>>> pred dtrended_var, res >>>>>> } >>>>>> >>>>>> I know this is not possible. However, looping through my ids and if >>>>>> conditions is not feasible either (or I collect them into a local with >>>>>> -levelsof-?). Actually, with all the if conditions, it is not >>>>>> attractive either, let alone feasible. (Or if I sort by id, I can use >>>>>> in conditions in the balanced subset, which I presume to be much >>>>>> faster?) >>>>>> >>>>>> Or shall I just loop over a new id that will be consecutive integers >>>>>> if I -egen, group- the old id (or do the same with ins)? >>>>>> >>>>>> I had some hopes about -xtdata- or -areg-, but to no avail. Yet I look for some guidance on doing this the right way, if even the simple -areg- could have been made faster by "orders of magnitude" from Stata 11 to 12…

Thank you for any thoughts,

Laszlo

