Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: Re: st: Calculating averages when there are missing observations


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: Re: st: Calculating averages when there are missing observations
Date   Tue, 06 Feb 2007 10:18:09 +0000

In the current version of Stata, -egen, rmean()- is
undocumented but works. The preferred name is -rowmean()-.
But whatever the name, it is indeed exactly what Laura needs.

Suppose this function did not exist. How would you re-invent it?
Here is one way.

gen rowtotal = 0
gen npresent = 0

qui foreach v of var v1-v5 {
replace npresent = npresent + !mi(`v')
replace rowtotal = rowtotal + (`v' * !mi(`v'))
}

replace rowtotal = cond(npresent == 0, ., rowtotal / npresent)
rename rowtotal rowmean
* drop npresent

There is, as said, no need to do this. But the above is
a fairly Stataish way of approaching the problem.

The code here is easier to generalise than Laura's code. Also, her
code has at least two specific limitations. She changed
her data; for many purposes, she would need to reverse that.
If any zeros present had there been before, that is trickier
than her code allows.

Also, missing can also mean .a to .z, not just system missing
(.).

Nick
n.j.cox@durham.ac.uk

Quang Nguyen

egen average=rmean(v1-v5)

laura.flamand

> I need to perform a set of extremely simple operations, but I am not
> sure how to tell Stata not to use the missing values. Here is an
> example, I just need to calculate the average of five variables (v1,
> v2, v3, v4, v5), which is, of course, very simple to do:
>
> gen average=(v1+v2+v3+v4+v5)/5
>
> The problem is that, in cases where any or several of v1 to v5 are
> missing, the above operation will generate missing observations for
> 'average', what I need instead is to calculate the average of the
> variables not missing, that is:
>
> if v1==.
> Then I need average=(v2+v3+v4+v5)/4
>
> If v1==. & v4==.
> Then I need average=(v2+v3+v5)/3
>
> In the lines below appears what I did, somewhat convoluted. In short,
> I replaced the missing observations with zeros and then created a
> variable (den) which stores the number of non-missing in v1 to v5,
> then I calculated the average using den as the denominator.
>
> Is there a simpler, more elegant procedure to calculate averages when
> there are missing observations?
>
> 'My solution'
> gen average=(v1+v2+v3+v4+v5)/5
>
> *'p' stands for present
> *This program takes care of cases where 1 or more of the vars is missing and
> *then averages only the non-missing vars over time
> foreach v of var v1 v2 v3 v4 v5 {
> gen p_`v' = 1 if `v' ~= .
> replace p_`v' = 0 if p_`v' == .
> replace `v' = 0 if `v' == .
> }
>
> generate den = p_v1 + p_v2 + p_v3 + p_v4 + p_v5
> replace average = (v1 + v2 + v3 + v4 + v5)/den if den~=5
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index