Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xt: unit-specific trends

From	László Sándor <[email protected]>
To	[email protected]
Subject	Re: st: xt: unit-specific trends
Date	Thu, 19 Apr 2012 14:02:44 -0400

And for what it's worth, a final update: the task is already done for
the subset that was only half done after 11 hours. This is a speedup
by an order of magnitude.

Not unexpected from Mata, but still, another example, case study.

Thanks for all the help,

Laszlo

2012/4/19 László Sándor <[email protected]>:
> I got curious about the Mata solution. Below is my code for a clumsy
> Stata command to do that, and the test code preceding it. It was
> really fast with a million observations under Stata 12.1 MP for mac.
>
> Please let me know if it does something wrong, or it could be
> improved. I think most if its performance will depend on the
> panelsubview implementation…
>
> Thanks again,
>
> Laszlo
>
> clear all
> discard
> set obs 1000000
> g i = int(_n/10)
> g t = mod(_n,10)
> g bump = t > 5
> g u = runiform()
> g y = 5 + 2*t + 10*bump
> g ry = .
> detrendbump = y, trend(t) bump(bump) resid(ry) by(i)
>
> exit
>
> program define detrendbump
>        version 11
>        syntax =/exp  [if] [in], trend(varname) bump(varname) resid(varname)
> by(varname)
>        marksample touse
>        mata: _detrendbump("`exp' `bump' `trend' `resid'","`by'","`touse'")
> end
>
> mata:
> mata set matastrict on
> mata set matalnum off, permanently
> mata set matafavor speed
> void _detrendbump(string scalar a,  ///
>                string scalar by,
>                string scalar touse)
> {
>                                real matrix M, X, pid, V, info
>                                real colvector y, xmean, ymean, b
>                                real scalar i
>                                st_view(pid, ., by, touse)
>                st_view(V,   .,a, touse)
>
>                info = panelsetup(pid, 1)
>
>                for (i=1; i<=rows(info); i++) {
>                        panelsubview(M,V, i, info)
>                                                y = M[.,1]
>                                                X = M[.,2\3]
>                                                xmean = mean(X, 1)
>                                                ymean = mean(y, 1)
>                                                b = invsym(quadcrossdev(X,xmean , X,xmean))*quadcrossdev(X,xmean
> , y,ymean)
>                        M[.,4] = y - X*b + b[1]*X[.,1] :-ymean :+ xmean*b
>                }
> }
> end
> exit
>
>
>
>
> 2012/4/19 László Sándor <[email protected]>
>>
>> Thanks, Nick, Austin,
>>
>> quickly on this: Before I rewrite my code again, I would appreciate a
>> comment from StataCorp. I used "if `touse'" because that is the
>> official way to make a program byable
>> (http://www.stata.com/help.cgi?byable). If there is any case where the
>> -if- condition need not be checked for the entire dataset, a -by: -
>> run is that, isn't it? (By the way, as a -by- always run when sorted
>> on the "by" variable, it is really prone to range subscripts.) So I
>> expect that Stata takes care of that when runs a byable command with
>> -bys: -.
>>
>> I am not sure my Mata code could be much better than a well-written
>> -by:-, so first I would hear about whether -bys: - is written well. :)
>> And my impression is that to drop _N-10 observations but then restore
>> after a preserve is prohibitively costly.
>>
>> Thanks,
>>
>> Laszlo
>>
>> On Thu, Apr 19, 2012 at 9:53 AM, Nick Cox <[email protected]> wrote:
>> >
>> > As I understand it:
>> >
>> > Work with -if- is really dumb. Stata tests every observation. Even if
>> > you say -if _n < 7- there is no intelligence saying "Oh, we're done
>> > here" once you get to observation 7. How could there be? In that
>> > example and some others working with -in- is a much, much better idea.
>> > However, you're working with -if `touse'- here. One alternative is
>> >
>> > preserve
>> > keep if `touse'
>> > ...
>> > restore
>> >
>> > but you still have to use -if-. It could be that
>> >
>> > replace `touse' = -`touse'
>> > sort `touse'
>> > qui count if `touse'
>> > local last = r(N)
>> >
>> > and then doing things
>> >
>> > ...  in 1/`last'
>> >
>> > helps.
>> >
>> > As for other stuff, I think most of the answers would be "It depends".
>> >
>> > On what MP does and doesn't do, see
>> > http://www.stata.com/statamp/statamp.pdf
>> >
>> > Over time, the Mata content of Stata is going up, but many of the
>> > basic operations are done using compiled C.
>> >
>> > I think Stata tech-support would need much more information than this
>> > on your data, your code and your machine before they could make many
>> > useful comments.
>> >
>> > More positively, you could use -set rmsg on- to see what is going most
>> > slowly.
>> >
>> > Nick
>> >
>> > 2012/4/19 László Sándor <[email protected]>:
>> > > Quick comments on this:
>> > >
>> > > I forgot to flag that the residual variable need to exist beforehand
>> > > for -genbump- below, this is only replacing values of it.
>> > >
>> > > More importantly: The operation is still far, far from linear in the
>> > > number of individuals (N in the panel — T is fixed). I could again
>> > > finish a 1% subsample in around 10 minutes or so, but my bold attempt
>> > > at 10% overnight still only finished 4 out of the 8 variables to be
>> > > transformed this way in 10 or 11 hours.
>> > >
>> > > Maybe caching and memory is an issue here, but if anybody (StataCorp?)
>> > > had a comment on this otherwise, that would be helpful.
>> > >
>> > > Maybe firing up _regress and _predict all the time is very costly? Or
>> > > the marksample is not fast enough with the by option? (Does the code
>> > > know that once it finished with seven consecutive rows there is
>> > > nothing to check further below "whether" `touse' is 1 anywhere else? I
>> > > guessed byable commands produce efficient subscripting for some
>> > > underlying Mata code…) Or even the byable command does not use MP
>> > > resources efficiently? (Still, even remaining serial, the speed-up
>> > > could be much closer to linear, no?)
>> > >
>> > > I thought individual-specific trends are almost as trendy nowadays as
>> > > fixed-effects — I wonder if they could be done much faster.
>> > >
>> >
>> > *
>> > *   For searches and help try:
>> > *   http://www.stata.com/help.cgi?search
>> > *   http://www.stata.com/support/statalist/faq
>> > *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: xt: unit-specific trends
  - From: László Sándor <[email protected]>
- Re: st: xt: unit-specific trends
  - From: László Sándor <[email protected]>
- Re: st: xt: unit-specific trends
  - From: László Sándor <[email protected]>
- Re: st: xt: unit-specific trends
  - From: Nick Cox <[email protected]>
- Re: st: xt: unit-specific trends
  - From: László Sándor <[email protected]>
- Re: st: xt: unit-specific trends
  - From: László Sándor <[email protected]>
- Re: st: xt: unit-specific trends
  - From: László Sándor <[email protected]>
- Re: st: xt: unit-specific trends
  - From: Nick Cox <[email protected]>
- Re: st: xt: unit-specific trends
  - From: László Sándor <[email protected]>
- Re: st: xt: unit-specific trends
  - From: László Sándor <[email protected]>

Prev by Date: RE: Re: st: Stata to Matlab
Next by Date: st: Labeling different kinds of missing observations
Previous by thread: Re: st: xt: unit-specific trends
Next by thread: Re: st: xt: unit-specific trends
Index(es):
- Date
- Thread