Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: in not if for large panels


From   Kit Baum <baum@bc.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: in not if for large panels
Date   Thu, 17 Jul 2003 09:57:31 -0400

On Thursday, Jul 17, 2003, at 02:33 US/Eastern, Michael wrote:

Instead, I've written .ado code that uses -in- instead of -if- to identify
groups and find that it usually cuts down the calculation time by 85%-95% in
most fairly large datasets. The code is hardwired to use -regress- and
saves a pre-determined list of statistics into the current dataset (repeated
across each panel). Because of these limitations, I haven't posted it to
SSC. I've been planning to change it to be a virtually identical standin
for statsby (perhaps called statsbyin or statsbyfast) but haven't gotten
around to it.
I have found the same thing when assisting one of my colleagues who was working with a very large panel dataset. The difference in speed between in and if is tremendous (and logically so, since 'if' must examine each observation for validity, including those you have already processed). What we worked out for him (in the context of an unbalanced panel) was a counter that tracked the first and last observation of each unit; the 'in' clause then just looped over that counter. If you have a balanced panel, it is even easier--you just create a couple of counters and add T to them each time around the loop (which could be done with a single forvalues statement). Writing your own loop, and taking care of the minor housekeeping needed to stash the results of estimation in a convenient place, will save you a lot of time overall.

Kit

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index