**[TS] arima** -- ARIMA, ARMAX, and other dynamic regression models

__Syntax__

Basic syntax for a regression model with ARMA disturbances

**arima** *depvar* [*indepvars*]**,** **ar(***numlist***)** **ma(***numlist***)**

Basic syntax for an ARIMA(*p*,*d*,*q*) model

**arima** *depvar***,** **arima(***#p***,***#d***,***#q***)**

Basic syntax for a multiplicative seasonal ARIMA(*p*,*d*,*q*)*(*P*,*D*,*Q*)s model

**arima** *depvar***,** **arima(***#p***,***#d***,***#q***)** **sarima(***#P***,***#D***,***#Q***,***#s***)**

Full syntax

**arima** *depvar* [*indepvars*] [*if*] [*in*] [*weight*] [**,** *options*]

*options* Description
-------------------------------------------------------------------------
Model
__nocons__**tant** suppress constant term
**arima(***#p***,***#d***,***#q***)** specify ARIMA(*p,d,q*) model for dependent
variable
**ar(***numlist***)** autoregressive terms of the structural
model disturbance
**ma(***numlist***)** moving-average terms of the structural
model disturbance
__c__**onstraints(***constraints***)** apply specified linear constraints
__col__**linear** keep collinear variables

Model 2
**sarima(***#P***,***#D***,***#Q***,***#s***)** specify period-*#s* multiplicative seasonal
ARIMA term
**mar(***numlist***,** *#s***)** multiplicative seasonal autoregressive
terms; may be repeated
**mma(***numlist***,** *#s***)** multiplicative seasonal moving-average
terms; may be repeated

Model 3
__cond__**ition** use conditional MLE instead of full MLE
__save__**space** conserve memory during estimation
__di__**ffuse** use diffuse prior for starting Kalman
filter recursions
**p0(***#*|*matname***)** use alternate prior for starting Kalman
recursions; seldom used
**state0(***#*|*matname***)** use alternate state vector for starting
Kalman filter recursions

SE/Robust
**vce(***vcetype***)** *vcetype* may be **opg**, __r__**obust**, or **oim**

Reporting
__l__**evel(***#***)** set confidence level; default is **level(95)**
__det__**ail** report list of gaps in time series
__nocnsr__**eport** do not display constraints
*display_options* control columns and column formats, row
spacing, and line width

Maximization
*maximize_options* control the maximization process; seldom
used

__coefl__**egend** display legend instead of statistics
-------------------------------------------------------------------------
You must **tsset** your data before using **arima**; see **[TS] tsset**.
*depvar* and *indepvars* may contain time-series operators; see tsvarlist.
**by**, **fp**, **rolling**, **statsby**, and **xi** are allowed; see prefix.
**iweight**s are allowed; see weights.
**coeflegend** does not appear in the dialog box.
See **[TS] arima postestimation** for features available after estimation.

__Menu__

**Statistics > Time series > ARIMA and ARMAX models**

__Description__

**arima** fits univariate models for a time series, where the disturbances
are allowed to follow a linear autoregressive moving-average (ARMA)
specification. When independent variables are included in the
specification, such models are often called ARMAX models; and when
independent variables are not specified, they reduce to Box-Jenkins
autoregressive integrated moving-average (ARIMA) models in the dependent
variable.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**noconstant**; see **[R] estimation options**.

**arima(***#p***,***#d***,***#q***)** is an alternative, shorthand notation for specifying
models with ARMA disturbances. The dependent variable and any
independent variables are differenced *#d* times, 1 through *#p* lags of
autocorrelations and 1 through *#q* lags of moving averages are
included in the model. For example, the specification

**. arima D.y, ar(1/2) ma(1/3)**

is equivalent to

**. arima y, arima(2,1,3)**

The latter is easier to write for simple ARMAX and ARIMA models, but
if gaps in the AR or MA lags are to be modeled, or if different
operators are to be applied to independent variables, the first
syntax is required.

**ar(***numlist***)** specifies the autoregressive terms of the structural model
disturbance to be included in the model. For example, **ar(1/3)**
specifies that lags of 1, 2, and 3 of the structural disturbance be
included in the model; **ar(1 4)** specifies that lags 1 and 4 be
included, perhaps to account for additive quarterly effects.

If the model does not contain regressors, these terms can also be
considered autoregressive terms for the dependent variable.

**ma(***numlist***)** specifies the moving-average terms to be included in the
model. These are the terms for the lagged innovations (white-noise
disturbances).

**constraints(***constraints***)**, **collinear**; see **[R] estimation options**.

If constraints are placed between structural model parameters and
ARMA terms, the first few iterations may attempt steps into
nonstationary areas. This process can be ignored if the final
solution is well within the bounds of stationary solutions.

+---------+
----+ Model 2 +----------------------------------------------------------

**sarima(***#P***,***#D***,***#Q***,***#s***)** is an alternative, shorthand notation for specifying
the multiplicative seasonal components of models with ARMA
disturbances. The dependent variable and any independent variables
are lag-*#s* seasonally differenced *#D* times, and 1 through *#P* seasonal
lags of autoregressive terms and 1 through *#Q* seasonal lags of
moving-average terms are included in the model. For example, the
specification

**. arima DS12.y, ar(1/2) mar(1/2,12) mma(1/2,12)**

is equivalent to

**. arima y, arima(2,1,3) sarima(2,1,2,12)**

**mar(***numlist***,** *#s***)** specifies the lag-*#s* multiplicative seasonal
autoregressive terms. For example, **mar(1/2,12)** requests that the
first two lag-12 multiplicative seasonal autoregressive terms be
included in the model.

**mma(***numlist***,** *#s***)** specifies the lag-*#s* multiplicative seasonal
moving-average terms. For example, **mma(1 3,12)** requests that the
first and third (but not the second) lag-12 multiplicative seasonal
moving-average terms be included in the model.

+---------+
----+ Model 3 +----------------------------------------------------------

**condition** specifies that conditional, rather than full, maximum
likelihood estimates be produced. The presample values for epsilon_t
and mu_t are taken to be their expected value of zero, and the
estimate of the variance of epsilon_t is taken to be constant over
the entire sample; see Hamilton (1994, 132). This estimation method
is not appropriate for nonstationary series but may be preferable for
long series or for models that have one or more long AR or MA lags.
**diffuse**, **p0()**, and **state0()** have no meaning for models fit from the
conditional likelihood and may not be specified with **condition**.

If the series is long and stationary and the underlying
data-generating process does not have a long memory, estimates will
be similar, whether estimated by unconditional maximum likelihood
(the default), conditional maximum likelihood (**condition**), or maximum
likelihood from a diffuse prior (**diffuse**).

In small samples, however, results of conditional and unconditional
maximum likelihood may differ substantially; see Annsley and Newbold
(1980). Whereas the default unconditional maximum likelihood
estimates make the most use of sample information when all the
assumptions of the model are met, Harvey (1989) and Ansley and Kohn
(1985) argue for diffuse priors often, particularly in ARIMA models
corresponding to an underlying structural model.

The **condition** or **diffuse** options may also be preferred when the model
contains one or more long AR or MA lags; this avoids inverting
potentially large matrices (see **diffuse** below).

When **condition** is specified, estimation is performed by the **arch**
command (see **[TS] arch**), and more control of the estimation process
can be obtained using **arch** directly.

**condition** cannot be specified if the model contains any
multiplicative seasonal terms.

**savespace** specifies that memory use be conserved by retaining only those
variables required for estimation. The original dataset is restored
after estimation. This option is rarely used and should be used only
if there is not enough space to fit a model without the option.
However, **arima** requires considerably more temporary storage during
estimation than most estimation commands in Stata.

**diffuse** specifies that a diffuse prior (see Harvey 1989 or 1993) be used
as a starting point for the Kalman filter recursions. Using **diffuse**,
nonstationary models may be fit with **arima** (see the **p0()** option
below; **diffuse** is equivalent to specifying **p0(1e9)**). See **[TS] arima**
for details.

**p0(***#*|*matname***)** is a rarely specified option that can be used for
nonstationary series or when an alternate prior for starting the
Kalman recursions is desired; see **[TS] arima** for details.

**state0(***#*|*matname***)** is a rarely used option that specifies an alternate
initial state vector for starting the Kalman filter recursions. If *#*
is specified, all elements of the vector are taken to be *#*. The
default initial state vector is **state0(0)**.

+-----------+
----+ SE/Robust +--------------------------------------------------------

**vce(***vcetype***)** specifies the type of standard error reported, which
includes types that are robust to some kinds of misspecification
(**robust**) and that are derived from asymptotic theory (**oim**, **opg**); see
**[R] ***vce_option*.

For state-space models in general and ARMAX and ARIMA models in
particular, the robust or quasi-maximum likelihood estimates (QMLEs)
of variance are robust to symmetric nonnormality in the disturbances,
including, as a special case, heteroskedasticity. The robust
variance estimates are not generally robust to functional
misspecification of the structural or ARMA components of the model;
see Hamilton (1994, 389) for a brief discussion.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**.

**detail** specifies that a detailed list of any gaps in the series be
reported, including gaps due to missing observations or missing data
for the dependent variable or independent variables.

**nocnsreport**; see **[R] estimation options**.

*display_options*: **noci**, __nopv__**alues**, **vsquish**, **cformat(***%fmt***)**, **pformat(%***fmt***)**,
**sformat(%***fmt***)**, and **nolstretch**; see **[R] estimation options**.

+--------------+
----+ Maximization +-----------------------------------------------------

*maximize_options*: __dif__**ficult**, __tech__**nique(***algorithm_spec***)**, __iter__**ate(***#***)**,
[__no__]__lo__**g**, __tr__**ace**, __grad__**ient**, **showstep**, __hess__**ian**, __showtol__**erance**,
__tol__**erance(***#***)**, __ltol__**erance(***#***)**, __nrtol__**erance(***#***)**, __gtol__**erance(***#***)**,
__nonrtol__**erance(***#***)**, and **from(***init_specs***)**; see **[R] maximize** for all
options except **gtolerance()**, and see below for information on
**gtolerance()**.

These options are sometimes more important for ARIMA models than most
maximum likelihood models because of potential convergence problems
with ARIMA models, particularly if the specified model and the sample
data imply a nonstationary model.

Several alternate optimization methods, such as
Berndt-Hall-Hall-Hausman (BHHH) and Broyden-Fletcher-Goldfarb-Shanno
(BFGS), are provided for ARIMA models. Although ARIMA models are not
as difficult to optimize as ARCH models, their likelihoods are
nevertheless generally not quadratic and often pose optimization
difficulties; this is particularly true if a model is nonstationary
or nearly nonstationary. Because each method approaches optimization
differently, some problems can be successfully optimized by an
alternate method when one method fails.

Setting **technique()** to something other than the default or BHHH
changes the *vcetype* to **vce(oim)**.

The following options are all related to maximization and are either
particularly important in fitting ARIMA models or not available for
most other estimators.

**technique(***algorithm_spec***)** specifies the optimization technique to use
to maximize the likelihood function.

**technique(bhhh)** specifies the Berndt-Hall-Hall-Hausman (BHHH)
algorithm.

**technique(dfp)** specifies the Davidon-Fletcher-Powell (DFP)
algorithm.

**technique(bfgs)** specifies the Broyden-Fletcher-Goldfarb-Shanno
(BFGS) algorithm.

**technique(nr)** specifies Stata's modified Newton-Raphson (NR)
algorithm.

You can specify multiple optimization methods. For example,

**technique(bhhh 10 nr 20)**

requests that the optimizer perform 10 BHHH iterations, switch to
Newton-Raphson for 20 iterations, switch back to BHHH for 10 more
iterations, and so on.

The default for **arima** is **technique(bhhh 5 bfgs 10)**.

**gtolerance(***#***)** specifies the tolerance for the gradient relative to
the coefficients. When |g_i*b_i| __<__ **gtolerance()** for all
parameters b_i and the corresponding elements of the gradient
g_i, the gradient tolerance criterion is met. The default
gradient tolerance for **arima** is **gtolerance(.05)**.

**gtolerance(999)** may be specified to disable the gradient
criterion. If the optimizer becomes stuck with repeated "(backed
up)" messages, the gradient probably still contains substantial
values, but an uphill direction cannot be found for the
likelihood. With this option, results can often be obtained, but
whether the global maximum likelihood has been found is unclear.

When the maximization is not going well, it is also possible to
set the maximum number of iterations (see **[R] maximize**) to the
point where the optimizer appears to be stuck and to inspect the
estimation results at that point.

**from(***init_specs***)** allows you to set the starting values of the model
coefficients; see **[R] maximize** for a general discussion and
syntax options.

The standard syntax for **from()** accepts a matrix, a list of
values, or coefficient name value pairs; see **[R] maximize**. **arima**
also accepts **from(armab0)**, which sets the starting value for all
ARMA parameters in the model to zero prior to optimization.

ARIMA models may be sensitive to initial conditions and may have
coefficient values that correspond to local maximums. The
default starting values for **arima** are generally good,
particularly in large samples for stationary series.

The following option is available with **arima** but is not shown in the
dialog box:

**coeflegend**; see **[R] estimation options**.

__Examples__

---------------------------------------------------------------------------
Setup
**. webuse wpi1**

Simple ARIMA model with differencing and autoregressive and
moving-average components
**. arima wpi, arima(1,1,1)**

Same as above
**. arima D.wpi, ar(1) ma(1)**

ARIMA model with additive seasonal effects
**. arima D.wpi, ar(1) ma(1 4)**

---------------------------------------------------------------------------
Setup
**. webuse air2**
**. generate lnair = ln(air)**

Multiplicative SARIMA model
**. arima lnair, arima(0,1,1) sarima(0,1,1,12) noconstant**

---------------------------------------------------------------------------
Setup
**. webuse friedman2, clear**

ARMAX model
**. arima consump m2 if tin(, 1981q4), ar(1) ma(1)**

ARMAX model with robust standard errors
**. arima consump m2 if tin(, 1981q4), ar(1) ma(1) vce(robust)**
---------------------------------------------------------------------------

__Video example__

Introduction to ARMA/ARIMA models

__Stored results__

**arima** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(N_gaps)** number of gaps
**e(k)** number of parameters
**e(k_eq)** number of equations in **e(b)**
**e(k_eq_model)** number of equations in overall model test
**e(k_dv)** number of dependent variables
**e(k1)** number of variables in first equation
**e(df_m)** model degrees of freedom
**e(ll)** log likelihood
**e(sigma)** sigma
**e(chi2)** chi-squared
**e(p)** p-value for model test
**e(tmin)** minimum time
**e(tmax)** maximum time
**e(ar_max)** maximum AR lag
**e(ma_max)** maximum MA lag
**e(rank)** rank of **e(V)**
**e(ic)** number of iterations
**e(rc)** return code
**e(converged)** **1** if converged, **0** otherwise

Macros
**e(cmd)** **arima**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(covariates)** list of covariates
**e(eqnames)** names of equations
**e(wtype)** weight type
**e(wexp)** weight expression
**e(title)** title in estimation output
**e(tmins)** formatted minimum time
**e(tmaxs)** formatted maximum time
**e(chi2type)** **Wald**; type of model chi-squared test
**e(vce)** *vcetype* specified in **vce()**
**e(vcetype)** title used to label Std. Err.
**e(ma)** lags for moving-average terms
**e(ar)** lags for autoregressive terms
**e(mar***i***)** multiplicative AR terms and lag *i*=1... (*#* seasonal
AR terms)
**e(mma***i***)** multiplicative MA terms and lag *i*=1... (*#* seasonal
MA terms)
**e(seasons)** seasonal lags in model
**e(opt)** type of optimization
**e(ml_method)** type of ml method
**e(user)** name of likelihood-evaluator program
**e(technique)** maximization technique
**e(tech_steps)** number of iterations performed before switching
techniques
**e(properties)** **b V**
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(marginsok)** predictions allowed by **margins**
**e(marginsnotok)** predictions disallowed by **margins**

Matrices
**e(b)** coefficient vector
**e(Cns)** constraints matrix
**e(ilog)** iteration log (up to 20 iterations)
**e(gradient)** gradient vector
**e(V)** variance-covariance matrix of the estimators
**e(V_modelbased)** model-based variance

Functions
**e(sample)** marks estimation sample

__References__

Ansley, C. F., and R. J. Kohn. 1985. Estimation, filtering, and smoothing
in state space models with incompletely specified initial conditions.
*Annals of Statistics* 13: 1286-1316.

Ansley, C. F., and P. Newbold. 1980. Finite sample properties of
estimators for autoregressive moving average models. *Journal of*
*Econometrics* 13: 159-183.

Hamilton, J. D. 1994. *Time Series Analysis*. Princeton: Princeton
University Press.

Harvey, A. C. 1989. *Forecasting, Structural Time Series Models and the*
*Kalman Filter*. Cambridge: Cambridge University Press.

------. 1993. *Time Series Models*. 2nd ed. Cambridge, MA: MIT Press.