[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: row mean (mean across columns)

From   Jacob Wegelin <>
Subject   st: row mean (mean across columns)
Date   Wed, 8 Oct 2008 14:38:53 -0400 (EDT)

Given any dataset of all numeric variables, I want to generate a new variable called myMean, which is the arithmetic mean (the average) across all the variables. The program below solves this problem. But surely there is a one-line command that will perform this task in Stata?

The post appears to contain a bug, in the sense that the row total computed is not corrected as in my code below.

This should be done in a general manner:

(1) As in the current dataset, the variables will not necessarily be in a form like a1 to a100.

(2) The number of variables is arbitrary, so I cannot hard-code the denominator as when myMeanByHand is computed below.

(3) If any value in a row is missing (.), the mean computed must also be missing, since then the mean across all variables is not defined. (Thus egen rowtotal is not the answer.)

Here is the code:

/* Generate a toy dataset */
set obs 5
gen x= _n
gen zoo = 20-x
gen whiskey=(_N - x) ^ 2
replace x = . in 2
/* First compute "by hand" with hard-coded denominator and variable names */
gen myMeanByHand= (x + zoo + whiskey ) / 3
sort x
save tmp, replace
drop myMeanByHand

capture program drop computeMeanAcrossColumns
program define computeMeanAcrossColumns
	/* Compute arithmetic mean across all columns */
	tempvar RowTotalTooMuch
	tempvar rowtotal
	scalar ncols=0
	gen `RowTotalTooMuch'=0
	foreach var of varlist * {
		quietly replace `RowTotalTooMuch'=`RowTotalTooMuch' + `var'
		scalar ncols=ncols + 1
	scalar nOrigCols=ncols-1
	gen `rowtotal' = `RowTotalTooMuch' / 2
	gen `1'= `rowtotal' / nOrigCols

computeMeanAcrossColumns "myMean"

/* Check myMean against myMeanByHand */
sort x merge x using tmp
assert _merge==3
drop _merge
assert myMean==myMeanByHand
drop myMeanByHand

/* An illustration with egen rowmean */

keep x zoo whiskey
/* The following works for rows with no missing values. It gives a misleading answer for a row that contains a missing value, since the average in that row is not defined. */
egen junk=rowmean(_all) list
drop junk

/* A related question: The following gives an incorrect answer. What in the world is it doing? */
egen junk=rowmean(*) list

Thanks for any insights


Jacob A. Wegelin Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index