[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: row mean (mean across columns)

From	Jacob Wegelin <[email protected]>
To	[email protected]
Subject	st: row mean (mean across columns)
Date	Wed, 8 Oct 2008 14:38:53 -0400 (EDT)

Given any dataset of all numeric variables, I want to generate a new variable called myMean, which is the arithmetic mean (the average) across all the variables. The program below solves this problem. But surely there is a one-line command that will perform this task in Stata?

The post http://www.stata.com/statalist/archive/2008-09/msg00597.html appears to contain a bug, in the sense that the row total computed is not corrected as in my code below.

This should be done in a general manner:

(1) As in the current dataset, the variables will not necessarily be in a form like a1 to a100.

(2) The number of variables is arbitrary, so I cannot hard-code the denominator as when myMeanByHand is computed below.

(3) If any value in a row is missing (.), the mean computed must also be missing, since then the mean across all variables is not defined. (Thus egen rowtotal is not the answer.)

Here is the code:

/* Generate a toy dataset */
clear
set obs 5
gen x= _n
gen zoo = 20-x
gen whiskey=(_N - x) ^ 2
replace x = . in 2
/* First compute "by hand" with hard-coded denominator and variable names */
gen myMeanByHand= (x + zoo + whiskey ) / 3
sort x
save tmp, replace
list
drop myMeanByHand

capture program drop computeMeanAcrossColumns
program define computeMeanAcrossColumns
	/* Compute arithmetic mean across all columns */
	tempvar RowTotalTooMuch
	tempvar rowtotal
	scalar ncols=0
	gen `RowTotalTooMuch'=0
	foreach var of varlist * {
		quietly replace `RowTotalTooMuch'=`RowTotalTooMuch' + `var'
		scalar ncols=ncols + 1
	}
	scalar nOrigCols=ncols-1
	gen `rowtotal' = `RowTotalTooMuch' / 2
	gen `1'= `rowtotal' / nOrigCols
end

computeMeanAcrossColumns "myMean"

/* Check myMean against myMeanByHand */

sort xmerge x using tmp

assert _merge==3
drop _merge
assert myMean==myMeanByHand
drop myMeanByHand
list

/* An illustration with egen rowmean */

keep x zoo whiskey
/* The following works for rows with no missing values. It gives a misleading answer for a row that contains a missing value, since the average in that row is not defined. */

egen junk=rowmean(_all)list

drop junk

/* A related question: The following gives an incorrect answer. What in the world is it doing? */

egen junk=rowmean(*)list


Thanks for any insights

Jake

Jacob A. Wegelin

[email protected]Assistant Professor

Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032

U.S.A.http://www.people.vcu.edu/~jwegelin


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: row mean (mean across columns)
  - From: "Nick Cox" <[email protected]>
- Re: st: row mean (mean across columns)
  - From: "Eva Poen" <[email protected]>
- Re: st: row mean (mean across columns)
  - From: Maarten buis <[email protected]>

Prev by Date: SV: st: Imbalance in control versus treated group, and weights
Next by Date: st: Adjusted Rates
Previous by thread: st: USE10: New Stata module to load Stata 10 data in Stata 9
Next by thread: Re: st: row mean (mean across columns)
Index(es):
- Date
- Thread