**[R] anova** -- Analysis of variance and covariance

__Syntax__

__an__**ova** *varname* [*termlist*] [*if*] [*in*] [*weight*] [**,** *options*]

where *termlist* is a factor-variable list with the following additional
features:

o Variables are assumed to be categorical; use the **c.** factor-variable
operator to override this.
o The **|** symbol (indicating nesting) may be used in place of the **#**
symbol (indicating interaction).
o The **/** symbol is allowed after a term and indicates that the
following term is the error term for the preceding terms.

*options* Description
-------------------------------------------------------------------------
Model
__rep__**eated(***varlist***)** variables in *term*s that are repeated-measures
variables
__p__**artial** use partial (or marginal) sums of squares
__se__**quential** use sequential sums of squares
__nocons__**tant** suppress constant term
__dropemp__**tycells** drop empty cells from the design matrix

Adv. model
**bse(***term***)** between-subjects error term in repeated-measures
ANOVA
**bseunit(***varname***)** variable representing lowest unit in the
between-subjects error term
__group__**ing(***varname***)** grouping variable for computing pooled
covariance matrix
-------------------------------------------------------------------------
**bootstrap**, **by**, **fp**, **jackknife**, and **statsby** are allowed; see prefix.
Weights are not allowed with the **bootstrap** prefix.
**aweight**s are not allowed with the **jackknife** prefix.
**aweight**s and **fweight**s are allowed; see weight.
See **[R] anova postestimation** for features available after estimation.

__Menu__

**Statistics > Linear models and related > ANOVA/MANOVA >** **Analysis of**
**variance and covariance**

__Description__

The **anova** command fits analysis-of-variance (ANOVA) and
analysis-of-covariance (ANCOVA) models for balanced and unbalanced
designs, including designs with missing cells; for repeated-measures
ANOVA; and for factorial, nested, or mixed designs.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**repeated(***varlist***)** indicates the names of the categorical variables in the
*term*s that are to be treated as repeated-measures variables in a
repeated-measures ANOVA or ANCOVA.

**partial** presents the ANOVA table using partial (or marginal) sums of
squares. This setting is the default. Also see the **sequential**
option.

**sequential** presents the ANOVA table using sequential sums of squares.

**noconstant** suppresses the constant term (intercept) from the ANOVA or
regression model.

**dropemptycells** drops empty cells from the design matrix. If
**c(emptycells)** is set to **keep** (see **set emptycells**), this option
temporarily resets it to **drop** before running the ANOVA model. If
**c(emptycells)** is already set to **drop**, this option does nothing.

+------------+
----+ Adv. model +-------------------------------------------------------

**bse(***term***)** indicates the between-subjects error term in a
repeated-measures ANOVA. This option is needed only in the rare case
when the **anova** command cannot automatically determine the
between-subjects error term.

**bseunit(***varname***)** indicates the variable representing the lowest unit in
the between-subjects error term in a repeated-measures ANOVA. This
option is rarely needed because the **anova** command automatically
selects the first variable listed in the between-subjects error term
as the default for this option.

**grouping(***varname***)** indicates a variable that determines which observations
are grouped together in computing the covariance matrices that will
be pooled and used in a repeated-measures ANOVA. This option is
rarely needed because the **anova** command automatically selects the
combination of all variables except the first (or as specified in the
**bseunit()** option) in the between-subjects error term as the default
for grouping observations.

__Remarks__

**anova** uses least squares to fit the linear models known as ANOVA or
ANCOVA (henceforth referred to simply as ANOVA models).

If you want to fit one-way ANOVA models, you may find the **oneway** or
**loneway** command more convenient. If you are interested in MANOVA or
MANCOVA, see **manova**.

The **regress** command is used to fit the underlying regression model
corresponding to an ANOVA model fit using the **anova** command. Type
**regress** after **anova** to see the coefficients, standard errors, etc., of
the regression model for the last run of **anova**.

Structural equation modeling provides a more general framework for
fitting ANOVA models; see the *Stata Structural Equation Modeling*
*Reference Manual*.

__Examples__

One-way ANOVA
**. webuse systolic**
**. anova systolic drug**

Two-way ANOVA
**. anova systolic drug disease**

Two-way factorial ANOVA
**. anova systolic drug disease drug#disease**

or more simply
**. anova systolic drug##disease**

Three-way factorial ANOVA
**. webuse manuf**
**. anova yield temp chem temp#chem meth temp#meth chem#meth**
**temp#chem#meth**

or more simply
**. anova yield temp##chem##meth**

ANCOVA
**. webuse census2**
**. quietly summarize age**
**. generate mage = age - r(mean)**
**. anova drate region c.mage region#c.mage**

Nested ANOVA
**. webuse machine, clear**
**. anova output machine / operator|machine /, dropemptycells**

Split-plot ANOVA
**. webuse reading**
**. anova score prog / class|prog skill prog#skill / class#skill|prog /**
**group|class#skill|prog /, dropemptycells**

Repeated-measures ANOVA
**. webuse t43**
**. anova score person drug, repeated(drug)**

__Video examples__

Analysis of covariance in Stata

Two-way ANOVA in Stata

__Stored results__

**anova** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(mss)** model sum of squares
**e(df_m)** model degrees of freedom
**e(rss)** residual sum of squares
**e(df_r)** residual degrees of freedom
**e(r2)** R-squared
**e(r2_a)** adjusted R-squared
**e(F)** F statistic
**e(rmse)** root mean squared error
**e(ll)** log likelihood
**e(ll_0)** log likelihood, constant-only model
**e(ss_***#***)** sum of squares for term *#*
**e(df_***#***)** numerator degrees of freedom for term *#*
**e(ssdenom_***#***)** denominator sum of squares for term *#* (when using
nonresidual error)
**e(dfdenom_***#***)** denominator degrees of freedom for term *#* (when
using nonresidual error)
**e(F_***#***)** F statistic for term *#* (if computed)
**e(N_bse)** number of levels of the between-subjects error term
**e(df_bse)** degrees of freedom for the between-subjects error
term
**e(box***#***)** Box's conservative epsilon for a particular
combination of repeated variables (**repeated()**
only)
**e(gg***#***)** Greenhouse-Geisser epsilon for a particular
combination of repeated variables (**repeated()**
only)
**e(hf***#***)** Huynh-Feldt epsilon for a particular combination of
repeated variables (**repeated()** only)
**e(rank)** rank of **e(V)**

Macros
**e(cmd)** **anova**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(varnames)** names of the right-hand-side variables
**e(term_***#***)** term *#*
**e(errorterm_***#***)** error term for term *#* (when using nonresidual
error)
**e(sstype)** type of sum of squares; **sequential** or **partial**
**e(repvars)** names of repeated variables (**repeated()** only)
**e(repvar***#***)** names of repeated variables for a particular
combination (**repeated()** only)
**e(model)** **ols**
**e(wtype)** weight type
**e(wexp)** weight expression
**e(properties)** **b V**
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(asbalanced)** factor variables **fvset** as **asbalanced**
**e(asobserved)** factor variables **fvset** as **asobserved**

Matrices
**e(b)** coefficient vector
**e(V)** variance-covariance matrix of the estimators
**e(Srep)** covariance matrix based on repeated measures
(**repeated()** only)

Functions
**e(sample)** marks estimation sample