[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Michael Blasnik" <michael.blasnik@verizon.net> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: Using statsby command for large panel data set |

Date |
Wed, 16 Jul 2003 09:19:07 -0400 |

I often need to do the same kind of thing on even larger datasets and have found that commands like statsby (which use the construct "if bygroup==X") are very slow when there are many panels because of repeated use of -if- commands that require making a comparison for every observation in the dataset. Statsby would speed up considerably if you just made a series of smaller datasets and combined the results. But this approach seems kind of silly and perhaps tedious unless you wrote your own wrapper for it. Instead, I've written .ado code that uses -in- instead of -if- to identify groups and find that it usually cuts down the calculation time by 85%-95% in most fairly large datasets. The code is hardwired to use -regress- and saves a pre-determined list of statistics into the current dataset (repeated across each panel). Because of these limitations, I haven't posted it to SSC. I've been planning to change it to be a virtually identical standin for statsby (perhaps called statsbyin or statsbyfast) but haven't gotten around to it. If you want to see it or use it or adopt it for your needs, I can email it directly to you. Michael Blasnik michael.blasnik@verizon.net ----- Original Message ----- From: <darren.pain@ecb.int> To: <statalist@hsphsun2.harvard.edu> Sent: Wednesday, July 16, 2003 3:31 AM Subject: st: Using statsby command for large panel data set > Hello > I have a large unbalanced panel data set (observations on over 8000 firms > for up to 135 periods). I want to undertake some simple time-series > regressions for each firm and access the estimation results. The statsby > command seems to be the appropriate command. But given the number of > regressions, it takes a very long time to execute - in fact my impatience > has got the better of me on every occasion I have tried to run the command > on the full data set and I have interrupted the procedure. One solution may > be to be more patient! Another I thought of was to split the data set up > into smaller units, run the statsby command on each of the smaller data sets > and then merge the estimation results to give the required outcome. But I > wondered if there were other ways that people have tried, or whether there > are ways to speed up the procedures. I want to use the estimated > coefficients from the regressions to create transformations of the original > variables, so ideally I would like to work with the full dataset. For info. > I am using Stata/SE 8.1 > > Thanks in advance for any advice. > > Darren > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Using statsby command for large panel data set***From:*darren.pain@ecb.int

- Prev by Date:
**st: desmat and Stata8** - Next by Date:
**st: RE: dialog box programming** - Previous by thread:
**st: Using statsby command for large panel data set** - Next by thread:
**st: dialog box programming** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |