Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Speed with large panel datasets

From	Gordon Hughes <[email protected]>
To	[email protected]
Subject	st: Speed with large panel datasets
Date	Mon, 21 Mar 2011 10:39:07 +0000

This is partly a comment and partly a query. I have a rather largedataset > 500,000 observations which consists of an unbalanced panelof up to 1600 observations for about 450 panel units. I am carryingout panel-specific ARIMA analyses for some or all panel units inorder to estimate the distribution of a set of coefficients across panels.

It turns out that the time required to execute the task varies by atleast an order of magnitude depending upon how I set up theanalysis. By far the slowest method is to embed the -arima- commandin a loop of the following kind:


forval i=1/`npanel' {
    arima depvar indvar1 indvar2 ...  if np=`i' & <some other condition>
}

The execution is faster if I discard all panels that do not satisfy<some other condition> before initiating the loop. However, the bestmethod is (a) discard all data which does not satisfy <some othercondition>, (b) reshape the dataset into wide format so that thedependent and independent variables are stored as depvar_`np'indvar1_`np' indvar2_`np' etc, and (c) execute the loop


forval i=1/`npanel' {
    arima depvar_`i' indvar1_`i' indvar2_`i' ...
}

Even allowing for the time required to reshape a rather large datasetthis is much faster than any alternative that I have tried. It seemsthat the overhead of processing missing cases in -arima- is veryhigh. There is a "savespace" option in -arima- that is designed toreduce the amount of memory required by the command but whichconstructs and works with a temporary dataset but the gain in overallexecution speed is much less than the reshape approach.

My query is this: is -arima- unusual in having such a large overheadin processing excluded observations or is the approach of reshaping alarge dataset likely to pay off for other estimators when it isnecessary to repeat an estimation procedure for a substantial numberof panel units?


Gordon Hughes
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Speed with large panel datasets
  - From: Daniel Feenberg <[email protected]>
- Re: st: Speed with large panel datasets
  - From: Maarten buis <[email protected]>

Prev by Date: Re: st: scatter plot with lines laid on
Next by Date: st: fitted option for predict with xtmepoisson
Previous by thread: st: How to allocate more memories to Stata
Next by thread: Re: st: Speed with large panel datasets
Index(es):
- Date
- Thread