[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
vwiggins@stata.com (Vince Wiggins, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Help with ARIMA |

Date |
Tue, 22 Apr 2003 15:19:14 -0500 |

Clarence Tam <Clarence.Tam@lshtm.ac.uk> asks whether he needs to have Stata/SE to estimate an arima model with an MA term at the 52nd lag, > [...] Model diagnostics suggest that there's a residual seasonal > correlation (at week 52) both in the ACF and PACF. My next step was > going to be to include an additional AR or MA term to account for > this, but I'm not sure how to do it. I've tried: > > . arima DS52.lnreps, ar(1) ma(1 52) noconstant > > but Stata says that the matsize is too small, even though it's set > at the maximum of 800 (I'm using Intercooled Stata 8.0). > Does anyone have any suggestions on how to get round this problem > (preferably ones that don't involve upgrading to Stata SE...)? Answer ------ Clarence does not need to upgrade to SE. The message he received after his -arima- command should have been, matsize too small, must be max(AR, MA+1)^2 use -diffuse- option or type -help matsize- In this case, with the maximum MA being 52, the message implies that a matrix size of 53^2=2809 is required, and that would indeed require Stata/SE. The first suggestion in the message, however, will let him use Intercooled Stata to estimate the model. If Clarence types, . arima DS52.lnreps, ar(1) ma(1 52) noconstant diffuse ^^^^^^^ he should be able to estimate the model. Explanation ----------- By default -arima- uses a Kalman filter to produce unconditional maximum likelihood estimates of the specified model. To obtain the unconditional estimates the Kalman filter must be initialized with the expected value of the initial state vector and the MSE of this vector. These initial values depend on the current parameter estimates and in computing the MSE we must invert a square matrix the size of the state vector -- max(AR, MA+1)^2. Thus, the need for such a large matrix. These are the most efficient estimates for the model because the initial state vector and its MSE are forced to conform to the current parameter estimates. We can, however, obtain slightly less efficient estimates by assuming that the initial state vector is zero and its variance is unknown and effectively infinite. This is what the -diffuse- option specifies. This assumption essential down-weights the initial observations until the data itself can be used to develop a state vector and its MSE. With large datasets, the two estimates tend to be close. Suggestion ---------- Even though this model has only 4 parameters, including sigma, the Kalman filter iterations may be somewhat slow because the filter must maintain a state vector that is the maximum of the largest AR or MA term and will thus be flopping around some pretty large matrices to compute the likelihood at each observation. For this reason, I would recommend that Clarence use the -condition- option to estimate the model, . arima DS52.lnreps, ar(1) ma(1 52) noconstant condition ^^^^^^^^^ The -condition- option specifies conditional-maximum likelihood estimates, rather than unconditional. These estimates to not require maintaining a state vector. Specifically, all pre-sample values of the white noise, e_t, and autocorrelated, u_t, disturbances are taken to be 0 and the MSE of e_t is taken to be constant over the entire sample. Effectively this means that the initial observations in the sample get just as much weight as the middle or end observations even though we know less about them. We know less because the process is autocorrelated and this implies that knowing the past observations tells us something about the current observation, and because nothing is known about the pre-sample observations. What unconditional maximum likelihood effectively does is use the current estimates to imply information about the pre-sample while optimally down-weighting this information so that the initial observations get a little less weight that the remaining observations. What the -diffuse- option effectively does is to say we know nothing about the pre-sample and accordingly down-weights the initial observations in the sample even more. What conditional maximum likelihood effectively does is assume that the pre-sample values are their long-run expected value of zero, that we know this just as well as we know later later, and accordingly weights the initial observations equally with the remaining observations. With large datasets, it generally does not matter which method we use because the contribution of the initial observations is dominated by the remaining data. Note, however, that "large" must be used carefully when the process has large autocorrelation terms. -- Vince vwiggins@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Help with ARIMA***From:*"Clarence" <johnsnowjr@yahoo.co.uk>

- Prev by Date:
**st: Case-cohort study** - Next by Date:
**Re: st: forvalues question** - Previous by thread:
**Re: st: Help with ARIMA** - Next by thread:
**Re: st: Help with ARIMA** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |