Stata 11 help for mf_mean

help mata mean() -------------------------------------------------------------------------------

Title

[M-5] mean() -- Means, variances, and correlations

Syntax

real rowvector mean(X [, w])

real matrix variance(X [, w])

real matrix quadvariance(X [, w])

real matrix meanvariance(X [, w])

real matrix quadmeanvariance(X [, w])

real matrix correlation(X [, w])

real matrix quadcorrelation(X [, w])

where

X: real matrix X (rows are observations, columns are variables)

w: real colvector w and is optional.

Description

mean(X, w) returns the weighted-or-unweighted column means of data matrix X. mean() uses quad precision in forming sums and so is very accurate.

variance(X, w) returns the weighted-or-unweighted variance matrix of X. In the calculation, means are removed and those means are calculated in quad precision, but quad precision is not otherwise used.

quadvariance(X, w) returns the weighted-or-unweighted variance matrix of X. Calculation is highly accurate; quad precision is used in forming all sums.

meanvariance(X, w) returns mean(X,w)\variance(X,w).

quadmeanvariance(X, w) returns mean(X,w)\quadvariance(X,w).

correlation(X, w) returns the weighted-or-unweighted correlation matrix of X. correlation() obtains the variance matrix from variance().

quadcorrelation(X, w) returns the weighted-or-unweighted correlation matrix of X. quadcorrelation() obtains the variance matrix from quadvariance().

In all cases, w specifies the weight. Omit w, or specify w as 1 to obtain unweighted means.

In all cases, rows of X or w that contain missing values are omitted from the calculation, which amounts to casewise deletion.

Remarks

1. There is no quadmean() function because mean(), in fact, is quadmean(). The fact that mean() defaults to the quad-precision calculation reflects our judgment that the extra computational cost in computing means in quad precision is typically justified.

2. The fact that variance() and correlation() do not default to using quad precision for their calculations reflects our judgment that the extra computational cost is typically not justified. The emphasis on this last sentence is on the word typically.

It is easier to justify means in part because the extra computational cost is less: there are only k means but k(k+1)/2 variances and covariances.

3. If you need just the mean or just the variance matrix, call mean() or variance() (or quadvariance()). If you need both, there is a CPU-time savings to be had by calling meanvariance() instead of the two functions separately (or quadmeanvariance() instead of calling mean() and quadvariance()).

The savings is not great -- one mean() calculation is saved -- but the greater rows(X), the greater the savings.

Upon getting back the combined result, it can be efficiently broken into its components via

: var = meanvariance(X) : means = var[1,.] : var = var[|2,1 \ .,.|]

Conformability

mean(X, w): X: n x k w: n x 1 or 1 x 1 (optional, w=1 assumed) result: 1 x k

variance(X, w), quadvariance(X, w), correlation(X, w), quadcorrelation(X, w): X: n x k w: n x 1 or 1 x 1 (optional, w=1 assumed) result: k x k

meanvariance(X, w), quadmeanvariance(X, w): X: n x k w: n x 1 or 1 x 1 (optional, w=1 assumed) result: (k+1) x k

Diagnostics

All functions omit from the calculation rows that contain missing values unless all rows contain missing values. Then the returned result contains all missing values.

Source code

mean.mata, variance.mata, quadvariance.mata, meanvariance.mata, quadmeanvariance.mata, correlation.mata, quadcorrelation.mata

Also see

Manual: [M-5] mean()

Help: [M-4] statistical


© Copyright 1996–2009 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index