Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: RE: RE: RE: Re: RE: Multiple commands under "By varlist"?

From   "Michael Blasnik" <>
To   <>
Subject   st: Re: RE: RE: RE: Re: RE: Multiple commands under "By varlist"?
Date   Sat, 26 Jun 2004 15:55:41 -0400

Maybe you didn't see the follow-up email where I provide some specific code
on how to implement the -in- approach to selecting groups instead of
the -if- approach, or maybe you are wondering why it works.

If you issue a command like:

regress y x if group==`i'

then Stata must evaluate the -if- part of the expression on the full dataset
to identify the sample for the command.  Often, this type of statement in a
loop is followed up with statements that copy the coefficients and or
standard errors into variables, again using the -if- expression.  That adds
up to many passes through the entire dataset to select the same small subset
of observations.  There isn't much problem with this approach when you have
just a few or even a few dozen groups, but when you have 1000 groups or
100,000, then you may be making many thousands of passes through the dataset
to evaluate the -if- expression for each group.  For example, if you have
1000 groups with 10 obs each, then each -if- expression requires making
10,000 evaluations.  If your loop has just 3 -if- expressions, that's
30,000,000 evaluations of the -if- expression to run your whole loop (3 *
10,000 * 1000).

In contrast, if you could identify each group using an -in- expression,
Stata can just directly work on the set of observations you want: -in- acts
as a direct pointer to the selected observations.  In terms of speed, for my
example with 1000 groups the -in- approach is typically about 10x-15x
faster.  There is a little overhead in terms of setting up the -in-
approach, but my prior email shows a fairly quick way to do it by generating
a variable that holds the count for each group and then using a -while- loop
that jumps from group to group in terms of observation numbers covered.

Michael Blasnik

----- Original Message ----- 
From: "Apostolos Ballas" <>
To: <>
Sent: Saturday, June 26, 2004 1:58 PM
Subject: st: RE: RE: RE: Re: RE: Multiple commands under "By varlist"?

> It is probably that I am dim, but since I have a very similar problem (ie,
> many simulations which take hours) can some please explain how the
> example works.
> Thanks a lot for the help.
> Apostolos
> -----Original Message-----
> From:
> [] On Behalf Of Nick Cox
> Sent: Saturday, June 26, 2004 5:26 PM
> To:
> Subject: st: RE: RE: Re: RE: Multiple commands under "By varlist"?
> In this I referred to Michael Blasnik.
> 14 seconds later he posted a similar point.
> Clearly this should be written up in supermarket
> trash newspapers as an Amazing Coincidence.
> Nick
> Nick Cox
> >
> > 2. The way -if- is implemented. The
> > command
> >
> > regress returns factor if `i' == month
> >
> > is implemented by testing every observation
> > to see whether it should be included in
> > the regression. In your case 99.9% of
> > the observations are irrelevant to each
> > regression, but Stata takes no special
> > action to avoid that. You should be
> > able to substitute -if- by -in-:
> >
> > gen long obsno = _n
> > sort month port
> > forval i = 1/1000 {
> > local min = ...
> > local max = ...
> > regress returns factor in `min'/`max'
> > ...
> > }
> >
> > and by Blasnik's Law this should be much faster.

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index