Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: xt: unit-specific trends


From   László Sándor <[email protected]>
To   [email protected]
Subject   Re: st: xt: unit-specific trends
Date   Thu, 19 Apr 2012 13:11:31 -0400

I got curious about the Mata solution. Below is my code for a clumsy
Stata command to do that, and the test code preceding it. It was
really fast with a million observations under Stata 12.1 MP for mac.

Please let me know if it does something wrong, or it could be
improved. I think most if its performance will depend on the
panelsubview implementation…

Thanks again,

Laszlo

clear all
discard
set obs 1000000
g i = int(_n/10)
g t = mod(_n,10)
g bump = t > 5
g u = runiform()
g y = 5 + 2*t + 10*bump
g ry = .
detrendbump = y, trend(t) bump(bump) resid(ry) by(i)

exit

program define detrendbump
	version 11
	syntax =/exp  [if] [in], trend(varname) bump(varname) resid(varname)
by(varname)
	marksample touse
	mata: _detrendbump("`exp' `bump' `trend' `resid'","`by'","`touse'")
end

mata:
mata set matastrict on
mata set matalnum off, permanently
mata set matafavor speed
void _detrendbump(string scalar a,  ///
                string scalar by,
                string scalar touse)
{
				real matrix M, X, pid, V, info
				real colvector y, xmean, ymean, b
				real scalar i
				st_view(pid, ., by, touse)
                st_view(V,   .,a, touse)

                info = panelsetup(pid, 1)

                for (i=1; i<=rows(info); i++) {
                        panelsubview(M,V, i, info)
						y = M[.,1]
						X = M[.,2\3]
						xmean = mean(X, 1)
						ymean = mean(y, 1)
						b = invsym(quadcrossdev(X,xmean , X,xmean))*quadcrossdev(X,xmean
, y,ymean)
                        M[.,4] = y - X*b + b[1]*X[.,1] :-ymean :+ xmean*b
                }
}
end
exit




2012/4/19 László Sándor <[email protected]>
>
> Thanks, Nick, Austin,
>
> quickly on this: Before I rewrite my code again, I would appreciate a
> comment from StataCorp. I used "if `touse'" because that is the
> official way to make a program byable
> (http://www.stata.com/help.cgi?byable). If there is any case where the
> -if- condition need not be checked for the entire dataset, a -by: -
> run is that, isn't it? (By the way, as a -by- always run when sorted
> on the "by" variable, it is really prone to range subscripts.) So I
> expect that Stata takes care of that when runs a byable command with
> -bys: -.
>
> I am not sure my Mata code could be much better than a well-written
> -by:-, so first I would hear about whether -bys: - is written well. :)
> And my impression is that to drop _N-10 observations but then restore
> after a preserve is prohibitively costly.
>
> Thanks,
>
> Laszlo
>
> On Thu, Apr 19, 2012 at 9:53 AM, Nick Cox <[email protected]> wrote:
> >
> > As I understand it:
> >
> > Work with -if- is really dumb. Stata tests every observation. Even if
> > you say -if _n < 7- there is no intelligence saying "Oh, we're done
> > here" once you get to observation 7. How could there be? In that
> > example and some others working with -in- is a much, much better idea.
> > However, you're working with -if `touse'- here. One alternative is
> >
> > preserve
> > keep if `touse'
> > ...
> > restore
> >
> > but you still have to use -if-. It could be that
> >
> > replace `touse' = -`touse'
> > sort `touse'
> > qui count if `touse'
> > local last = r(N)
> >
> > and then doing things
> >
> > ...  in 1/`last'
> >
> > helps.
> >
> > As for other stuff, I think most of the answers would be "It depends".
> >
> > On what MP does and doesn't do, see
> > http://www.stata.com/statamp/statamp.pdf
> >
> > Over time, the Mata content of Stata is going up, but many of the
> > basic operations are done using compiled C.
> >
> > I think Stata tech-support would need much more information than this
> > on your data, your code and your machine before they could make many
> > useful comments.
> >
> > More positively, you could use -set rmsg on- to see what is going most
> > slowly.
> >
> > Nick
> >
> > 2012/4/19 László Sándor <[email protected]>:
> > > Quick comments on this:
> > >
> > > I forgot to flag that the residual variable need to exist beforehand
> > > for -genbump- below, this is only replacing values of it.
> > >
> > > More importantly: The operation is still far, far from linear in the
> > > number of individuals (N in the panel — T is fixed). I could again
> > > finish a 1% subsample in around 10 minutes or so, but my bold attempt
> > > at 10% overnight still only finished 4 out of the 8 variables to be
> > > transformed this way in 10 or 11 hours.
> > >
> > > Maybe caching and memory is an issue here, but if anybody (StataCorp?)
> > > had a comment on this otherwise, that would be helpful.
> > >
> > > Maybe firing up _regress and _predict all the time is very costly? Or
> > > the marksample is not fast enough with the by option? (Does the code
> > > know that once it finished with seven consecutive rows there is
> > > nothing to check further below "whether" `touse' is 1 anywhere else? I
> > > guessed byable commands produce efficient subscripting for some
> > > underlying Mata code…) Or even the byable command does not use MP
> > > resources efficiently? (Still, even remaining serial, the speed-up
> > > could be much closer to linear, no?)
> > >
> > > I thought individual-specific trends are almost as trendy nowadays as
> > > fixed-effects — I wonder if they could be done much faster.
> > >
> >
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/help.cgi?search
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index