"Michael Blasnik" <michael.blasnik@verizon.net>

<statalist@hsphsun2.harvard.edu>

st: Re: RE: RE: RE: Re: RE: Multiple commands under "By varlist"?

Sat, 26 Jun 2004 15:55:41 -0400

Maybe you didn't see the follow-up email where I provide some specific code on how to implement the -in- approach to selecting groups instead of the -if- approach, or maybe you are wondering why it works. If you issue a command like: regress y x if group==`i' then Stata must evaluate the -if- part of the expression on the full dataset to identify the sample for the command. Often, this type of statement in a loop is followed up with statements that copy the coefficients and or standard errors into variables, again using the -if- expression. That adds up to many passes through the entire dataset to select the same small subset of observations. There isn't much problem with this approach when you have just a few or even a few dozen groups, but when you have 1000 groups or 100,000, then you may be making many thousands of passes through the dataset to evaluate the -if- expression for each group. For example, if you have 1000 groups with 10 obs each, then each -if- expression requires making 10,000 evaluations. If your loop has just 3 -if- expressions, that's 30,000,000 evaluations of the -if- expression to run your whole loop (3 * 10,000 * 1000). In contrast, if you could identify each group using an -in- expression, Stata can just directly work on the set of observations you want: -in- acts as a direct pointer to the selected observations. In terms of speed, for my example with 1000 groups the -in- approach is typically about 10x-15x faster. There is a little overhead in terms of setting up the -in- approach, but my prior email shows a fairly quick way to do it by generating a variable that holds the count for each group and then using a -while- loop that jumps from group to group in terms of observation numbers covered. Michael Blasnik michael.blasnik@verizon.net ----- Original Message ----- From: "Apostolos Ballas" <aballas@aueb.gr> To: <statalist@hsphsun2.harvard.edu> Sent: Saturday, June 26, 2004 1:58 PM Subject: st: RE: RE: RE: Re: RE: Multiple commands under "By varlist"? > It is probably that I am dim, but since I have a very similar problem (ie, > many simulations which take hours) can some please explain how the following > example works. > > Thanks a lot for the help. > > Apostolos > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: Saturday, June 26, 2004 5:26 PM > To: statalist@hsphsun2.harvard.edu > Subject: st: RE: RE: Re: RE: Multiple commands under "By varlist"? > > > In this I referred to Michael Blasnik. > 14 seconds later he posted a similar point. > > Clearly this should be written up in supermarket > trash newspapers as an Amazing Coincidence. > > Nick > n.j.cox@durham.ac.uk > > Nick Cox > > > > 2. The way -if- is implemented. The > > command > > > > regress returns factor if `i' == month > > > > is implemented by testing every observation > > to see whether it should be included in > > the regression. In your case 99.9% of > > the observations are irrelevant to each > > regression, but Stata takes no special > > action to avoid that. You should be > > able to substitute -if- by -in-: > > > > gen long obsno = _n > > sort month port > > forval i = 1/1000 { > > local min = ... > > local max = ... > > regress returns factor in `min'/`max' > > ... > > } > > > > and by Blasnik's Law this should be much faster. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

