Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Calculating averages when there are missing observations

From   lf <>
Subject   st: Calculating averages when there are missing observations
Date   Mon, 5 Feb 2007 22:34:08 -0800

Dear Stata list users:

I need to perform a set of extremely simple operations, but I am not
sure how to tell Stata not to use the missing values.  Here is an
example, I just need to calculate the average of five variables (v1,
v2, v3, v4, v5), which is, of course, very simple to do:

gen average=(v1+v2+v3+v4+v5)/5

The problem is that, in cases where any or several of v1 to v5 are
missing, the above operation will generate missing observations for
'average', what I need instead is to calculate the average of the
variables not missing, that is:

if v1==.
Then I need average=(v2+v3+v4+v5)/4

If v1==. & v4==.
Then I need average=(v2+v3+v5)/3

In the lines below appears what I did, somewhat convoluted. In short,
I replaced the missing observations with zeros and then created a
variable (den) which stores the number of non-missing in v1 to v5,
then I calculated the average using den as the denominator.

Is there a simpler, more elegant procedure to calculate averages when
there are missing observations?

'My solution'
gen average=(v1+v2+v3+v4+v5)/5

*'p' stands for present
*This program takes care of cases where 1 or more of the vars is missing and
*then averages only the non-missing vars over time
foreach v of var v1 v2 v3 v4 v5 {
      gen p_`v' = 1 if `v' ~= .
      replace p_`v' = 0 if p_`v' == .
      replace `v' = 0 if `v' == .

generate den = p_v1 + p_v2 + p_v3 + p_v4 + p_v5
replace average = (v1 + v2 + v3 + v4 + v5)/den if den~=5

Thank you very much in advance,

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index