Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
László Sándor <sandorl@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: xt: unit-specific trends |

Date |
Thu, 19 Apr 2012 18:35:14 -0400 |

Thank you, Bill. Of course it's great to have the correct results from Stata! I am just a bit surprised that the "if" checks slow down operations this much. Esp. by-loops. And esp. because -by:- wants to start sorted anyway, I thought you could be less permitting later on (e.g. maintain sort order). I would have guessed that the extra cost of not allowing re-sorting would have justified a dramatic speedup of the -by- which is pretty commonly used. But exactly these are the sorts of trade-offs that you are experts in. I will not second-guess your judgment. Perhaps the biggest lesson to me was how costly an if-check is in large datasets. Pretty frightening. But most basic operations (sorting or checks) must be hitting some theoretical limits, that's what we can squeeze out of computers. Thanks! Laszlo On Thu, Apr 19, 2012 at 2:36 PM, William Gould, StataCorp LP <wgould@stata.com> wrote: > > Laszlo <sandorl@gmail.com> wrote, > > > I used "if `touse'" because that is the official way to make a program > > byable (http://www.stata.com/help.cgi?byable). If there is any case > > where the -if- condition need not be checked for the entire dataset, a > > -by: - run is that, isn't it? > > Laszlo is wrong in assuming that the data are necessarily sorted, and > thus -if `touse' is the official way to program this case. > > The problem for -by- is that it is turning control over to a > user-written program, and it is not uncommon for user-written programs > to re-sort the data and then not put them back into the original > order. So -by- was written to accomondate that. > > If you as a programmer know that the the data will still be sorted > you can convert the -if `touse'- into an -in- range by coding, > > tempvar x > quietly gen long `x' = `touse'*_n > quietly sum `x', meanonly > local first = r(min) > local last = r(max) > drop `x' > > In the rest of your code you can then code -in `first'/`last'- instead > of -if `touse'-. > > There may be a quicker way to convert an -if `touse' into an -in- range. > This is just the first way that occurred to me. > > I would still be hesitant to use -in- range instead of -if `touse'- > because I would need to be certain that every command I used in my > ado-file did not change the sort order. > > Here's demonstration that of a by-able program that re-sorts the data > and yet still produces the expected results because it is coded using > -if `touse'-: > > . program tryit, byable(recall) > 1. di "hi" > 2. syntax > 3. marksample touse > 4. list rep78 if `touse' > 5. sort mpg > 6. end > > . sysuse auto, clear > (1978 Automobile Data) > > . sort rep78 > > . by rep78: tryit > > -------------------------------------- > -> rep78 = 1 > hi > > +-------+ > | rep78 | > |-------| > 1. | 1 | > 2. | 1 | > +-------+ > > -------------------------------------- > -> rep78 = 2 > hi > > +-------+ > | rep78 | > |-------| > 3. | 2 | > 14. | 2 | > 15. | 2 | > 22. | 2 | > 24. | 2 | > |-------| > 45. | 2 | > 52. | 2 | > 53. | 2 | > +-------+ > > <remaining output omitted> > > . _ > > When -tryit- was called the first time to process rep78==1, the data > were in order, and we see that, as expected, the observations for > which rep78 is 1 are at the top of the dataset, namely in observations > 1 and 2. Now look at the -tryit- code. -tryit-, just before exiting, > re-sorts the data! > > So, the second time -tryit- is called, when -tryit- is called to > process the rep78 = 2 data, the observations will not be in order. > And we can see that iun the listing. The listing was produced by > coding -list rep78 if `touse'- and, just as one would hope, all the > observations for which `touse' contains 1 are rep78==2 observations. > This time, however, the data are no longer in order. The observations > for which `touse' is 1 are observations 3, 14, 15, 22, 24, 45, 52, and > 53. It didn't matter, however, because we coded -if `touse'-. > > -by- plust -tryit- still produced correct results. > > Our thinking when we coded by and made the recommendation of using > -if `touse'- was that sometimes it is better to produce correct > results than to produce incorrect results more quickly. > > -- Bill > wgould@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: xt: unit-specific trends***From:*"William Gould, StataCorp LP" <wgould@stata.com>

- Prev by Date:
**st: proper use of aweight** - Next by Date:
**Re: st: xtmixed command** - Previous by thread:
**Re: st: xt: unit-specific trends** - Next by thread:
**Re: st: xt: unit-specific trends** - Index(es):