Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: xt: unit-specific trends
From 
 
Austin Nichols <[email protected]> 
To 
 
[email protected] 
Subject 
 
Re: st: xt: unit-specific trends 
Date 
 
Thu, 19 Apr 2012 09:48:19 -0400 
László Sándor <[email protected]>:
No need to run regressions, loop, etc.
You can just use a little algebra and by:
http://www.stata.com/statalist/archive/2012-02/msg01108.html
http://www.stata.com/statalist/archive/2008-10/msg00136.html
though it will be faster and more accurate in Mata.
If you decide to move into Mata, see also e.g.
http://www.stata.com/statalist/archive/2009-05/msg00841.html
2012/4/19 László Sándor <[email protected]>:
> Quick comments on this:
>
> I forgot to flag that the residual variable need to exist beforehand
> for -genbump- below, this is only replacing values of it.
>
> More importantly: The operation is still far, far from linear in the
> number of individuals (N in the panel — T is fixed). I could again
> finish a 1% subsample in around 10 minutes or so, but my bold attempt
> at 10% overnight still only finished 4 out of the 8 variables to be
> transformed this way in 10 or 11 hours.
>
> Maybe caching and memory is an issue here, but if anybody (StataCorp?)
> had a comment on this otherwise, that would be helpful.
>
> Maybe firing up _regress and _predict all the time is very costly? Or
> the marksample is not fast enough with the by option? (Does the code
> know that once it finished with seven consecutive rows there is
> nothing to check further below "whether" `touse' is 1 anywhere else? I
> guessed byable commands produce efficient subscripting for some
> underlying Mata code…) Or even the byable command does not use MP
> resources efficiently? (Still, even remaining serial, the speed-up
> could be much closer to linear, no?)
>
> I thought individual-specific trends are almost as trendy nowadays as
> fixed-effects — I wonder if they could be done much faster.
>
> Thanks,
>
> Laszlo
>
> 2012/4/18 László Sándor <[email protected]>:
>> In case anyone cares, this is what I came up with. (Detrends, demeans,
>> and also allows for a level shift.) And this is faster, as I expected.
>>
>> program define genbump, byable(recall, noheader)
>>        version 11
>>        syntax =/exp  [if] [in], trend(varname) bump(varname) resid(varname)
>>    marksample touse, novarlist
>>    tempvar res
>>        quietly {
>>        _regress `exp' `trend' `bump' if `touse'
>>        _predict `res', resid
>>        replace `resid' = `res'+_b[`bump']*`bump' if `touse'
>>        }
>> end
>>
>>
>> 2012/4/18 László Sándor <[email protected]>:
>>> Thanks, Nick,
>>>
>>> I left out a crucial part: I need to run it for observations in the
>>> 10K magnitude (full sample: 400K, but I also try to sample down).
>>>
>>> I just had the 200 / 4 mins as a measure of speed.
>>>
>>> I would really love to see this speed up.
>>>
>>> So I should make the residual-generation a separate command, and make
>>> it byable (but no egen), then? Any other trick up your sleeve?
>>>
>>> Gratefully, as always,
>>>
>>> Laszlo
>>>
>>> On Wed, Apr 18, 2012 at 7:56 PM, Nick Cox <[email protected]> wrote:
>>>> If a total task takes 3-4 minutes, dots to show progress are
>>>> pointless, in my view.
>>>>
>>>> -egen- is for convenience. Writing -egen- will not speed up; it will
>>>> just slow things down. Nick
>>>>
>>>> 2012/4/19 László Sándor <[email protected]>:
>>>>> Or a quick idea: Shall I write an -egen- extension instead? Or all
>>>>> benefits would come from its byability anyway?
>>>>>
>>>>> 2012/4/18 László Sándor <[email protected]>:
>>>>>> Let me get back to this now that I know how fast I am doing using -_dots-.
>>>>>>
>>>>>> Now I know it takes 3-4 minutes to loop through 200 cases while all I
>>>>>> do each time is a trivial regression on 4-7 observations and
>>>>>> predicting the residuals.
>>>>>>
>>>>>> I would greatly welcome suggestions on how to speed this up relative
>>>>>> to the code below. Most likely checking all cases for the -if-
>>>>>> condition when only few would satisfy and they could come in blocks
>>>>>> after a single sort could help things but I am out of ideas how to do
>>>>>> that. Making the code "byable" would at least use some features of MP?
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> Laszlo
>>>>>>
>>>>>> sum nid, d
>>>>>> _dots 0
>>>>>> forval i = 1/`r(max)' {
>>>>>> foreach v of varlist assets liabs netassets koejd {
>>>>>> cap reg `v' year post if nid == `i'
>>>>>> if _rc == 0 {
>>>>>> predict resid, resid
>>>>>> qui replace r`v' = resid + _b[post]*post if e(sample)
>>>>>> drop resid
>>>>>> }
>>>>>> }
>>>>>> _dots `i' 0
>>>>>> }
>>>>>>
>>>>>> 2012/4/13 László Sándor <[email protected]>:
>>>>>>> Hi all,
>>>>>>>
>>>>>>> I am trying to demean and detrend my panel data allowing for unit
>>>>>>> specific trends (using Stata 11.0 MP for Windows). I found some
>>>>>>> previous posts about this, but I am not satisfied with the speed of
>>>>>>> the solutions. I would be most happy with a "byable" solution, like
>>>>>>> this pseudocode:
>>>>>>>
>>>>>>> bys id: {
>>>>>>> reg var t
>>>>>>> pred dtrended_var, res
>>>>>>> }
>>>>>>>
>>>>>>> I know this is not possible. However, looping through my ids and if
>>>>>>> conditions is not feasible either (or I collect them into a local with
>>>>>>> -levelsof-?). Actually, with all the if conditions, it is not
>>>>>>> attractive either, let alone feasible. (Or if I sort by id, I can use
>>>>>>> in conditions in the balanced subset, which I presume to be much
>>>>>>> faster?)
>>>>>>>
>>>>>>> Or shall I just loop over a new id that will be consecutive integers
>>>>>>> if I -egen, group- the old id (or do the same with ins)?
>>>>>>>
>>>>>>> I had some hopes about -xtdata- or -areg-, but to no avail. Yet I look
>>>>>>> for some guidance on doing this the right way, if even the simple
>>>>>>> -areg- could have been made faster by "orders of magnitude" from Stata
>>>>>>> 11 to 12…
>>>>>>>
>>>>>>> Thank you for any thoughts,
>>>>>>>
>>>>>>> Laszlo
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/