Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
László Sándor <sandorl@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: xt: unit-specific trends |

Date |
Thu, 19 Apr 2012 14:02:44 -0400 |

And for what it's worth, a final update: the task is already done for the subset that was only half done after 11 hours. This is a speedup by an order of magnitude. Not unexpected from Mata, but still, another example, case study. Thanks for all the help, Laszlo 2012/4/19 László Sándor <sandorl@gmail.com>: > I got curious about the Mata solution. Below is my code for a clumsy > Stata command to do that, and the test code preceding it. It was > really fast with a million observations under Stata 12.1 MP for mac. > > Please let me know if it does something wrong, or it could be > improved. I think most if its performance will depend on the > panelsubview implementation… > > Thanks again, > > Laszlo > > clear all > discard > set obs 1000000 > g i = int(_n/10) > g t = mod(_n,10) > g bump = t > 5 > g u = runiform() > g y = 5 + 2*t + 10*bump > g ry = . > detrendbump = y, trend(t) bump(bump) resid(ry) by(i) > > exit > > program define detrendbump > version 11 > syntax =/exp [if] [in], trend(varname) bump(varname) resid(varname) > by(varname) > marksample touse > mata: _detrendbump("`exp' `bump' `trend' `resid'","`by'","`touse'") > end > > mata: > mata set matastrict on > mata set matalnum off, permanently > mata set matafavor speed > void _detrendbump(string scalar a, /// > string scalar by, > string scalar touse) > { > real matrix M, X, pid, V, info > real colvector y, xmean, ymean, b > real scalar i > st_view(pid, ., by, touse) > st_view(V, .,a, touse) > > info = panelsetup(pid, 1) > > for (i=1; i<=rows(info); i++) { > panelsubview(M,V, i, info) > y = M[.,1] > X = M[.,2\3] > xmean = mean(X, 1) > ymean = mean(y, 1) > b = invsym(quadcrossdev(X,xmean , X,xmean))*quadcrossdev(X,xmean > , y,ymean) > M[.,4] = y - X*b + b[1]*X[.,1] :-ymean :+ xmean*b > } > } > end > exit > > > > > 2012/4/19 László Sándor <sandorl@gmail.com> >> >> Thanks, Nick, Austin, >> >> quickly on this: Before I rewrite my code again, I would appreciate a >> comment from StataCorp. I used "if `touse'" because that is the >> official way to make a program byable >> (http://www.stata.com/help.cgi?byable). If there is any case where the >> -if- condition need not be checked for the entire dataset, a -by: - >> run is that, isn't it? (By the way, as a -by- always run when sorted >> on the "by" variable, it is really prone to range subscripts.) So I >> expect that Stata takes care of that when runs a byable command with >> -bys: -. >> >> I am not sure my Mata code could be much better than a well-written >> -by:-, so first I would hear about whether -bys: - is written well. :) >> And my impression is that to drop _N-10 observations but then restore >> after a preserve is prohibitively costly. >> >> Thanks, >> >> Laszlo >> >> On Thu, Apr 19, 2012 at 9:53 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> > >> > As I understand it: >> > >> > Work with -if- is really dumb. Stata tests every observation. Even if >> > you say -if _n < 7- there is no intelligence saying "Oh, we're done >> > here" once you get to observation 7. How could there be? In that >> > example and some others working with -in- is a much, much better idea. >> > However, you're working with -if `touse'- here. One alternative is >> > >> > preserve >> > keep if `touse' >> > ... >> > restore >> > >> > but you still have to use -if-. It could be that >> > >> > replace `touse' = -`touse' >> > sort `touse' >> > qui count if `touse' >> > local last = r(N) >> > >> > and then doing things >> > >> > ... in 1/`last' >> > >> > helps. >> > >> > As for other stuff, I think most of the answers would be "It depends". >> > >> > On what MP does and doesn't do, see >> > http://www.stata.com/statamp/statamp.pdf >> > >> > Over time, the Mata content of Stata is going up, but many of the >> > basic operations are done using compiled C. >> > >> > I think Stata tech-support would need much more information than this >> > on your data, your code and your machine before they could make many >> > useful comments. >> > >> > More positively, you could use -set rmsg on- to see what is going most >> > slowly. >> > >> > Nick >> > >> > 2012/4/19 László Sándor <sandorl@gmail.com>: >> > > Quick comments on this: >> > > >> > > I forgot to flag that the residual variable need to exist beforehand >> > > for -genbump- below, this is only replacing values of it. >> > > >> > > More importantly: The operation is still far, far from linear in the >> > > number of individuals (N in the panel — T is fixed). I could again >> > > finish a 1% subsample in around 10 minutes or so, but my bold attempt >> > > at 10% overnight still only finished 4 out of the 8 variables to be >> > > transformed this way in 10 or 11 hours. >> > > >> > > Maybe caching and memory is an issue here, but if anybody (StataCorp?) >> > > had a comment on this otherwise, that would be helpful. >> > > >> > > Maybe firing up _regress and _predict all the time is very costly? Or >> > > the marksample is not fast enough with the by option? (Does the code >> > > know that once it finished with seven consecutive rows there is >> > > nothing to check further below "whether" `touse' is 1 anywhere else? I >> > > guessed byable commands produce efficient subscripting for some >> > > underlying Mata code…) Or even the byable command does not use MP >> > > resources efficiently? (Still, even remaining serial, the speed-up >> > > could be much closer to linear, no?) >> > > >> > > I thought individual-specific trends are almost as trendy nowadays as >> > > fixed-effects — I wonder if they could be done much faster. >> > > >> > >> > * >> > * For searches and help try: >> > * http://www.stata.com/help.cgi?search >> > * http://www.stata.com/support/statalist/faq >> > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

**Re: st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

**Re: st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

**Re: st: xt: unit-specific trends***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

**Re: st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

**Re: st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

**Re: st: xt: unit-specific trends***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

**Re: st: xt: unit-specific trends***From:*László Sándor <sandorl@gmail.com>

- Prev by Date:
**RE: Re: st: Stata to Matlab** - Next by Date:
**st: Labeling different kinds of missing observations** - Previous by thread:
**Re: st: xt: unit-specific trends** - Next by thread:
**Re: st: xt: unit-specific trends** - Index(es):