Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Taking averages, etc.


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Taking averages, etc.
Date   Wed, 17 Dec 2003 10:55:49 -0000

Others have replied pointing out that Stata
does this already.

A related question, perhaps not of much practical
interest, and I guess not Adrian's question,
but nevertheless a small puzzle testing grasp of Stata
technique, is how to specify that an average will be
calculated only if all the values of interest
are non-missing. A single missing value would be
enough to instruct Stata not to calculate.

Here is one way to do it. It works interactively
in Stata 8, and could be automated in a program.

. u auto
. count if mi(rep78)
. if r(N) == 0 su rep78

Stata uses the count left behind by -count-
in r(N). As r(N) is in fact 5, nothing is done.

. count if mi(mpg)
. if r(N) == 0 su mpg

In this case r(N) is indeed 0 and the
calculation is done.

Non-programmers should note the crucial difference
in principle between

1. if <condition> <command>

and

2. <command> if <condition>

Form 1 carries out _one_ test of the <condition>
supplied. If it is true, <command> is carried
out, but not otherwise.

Form 2 carries a test of the <condition>
supplied for _every_ observation specified
and then carries out <command>
for the observations for which it is true.

For this problem, form 1 is better.
As it happens,

. count if mi(mpg)
. su mpg if r(N) == 0

is not only legal, but produces the correct result.
What happens is that Stata looks at the condition

	if r(N) == 0

and says in turn: is this true for observation 1?
for observation 2? for observation 3? and so
forth. As it happens, r(N) == 0 is nothing
to do with any of these observations in particular,
but Stata has little notion of irrelevance, and
irrelevance doesn't make a condition false.
(You could even use a tautology like -if 2 == 2-
which Stata would then test for every observation.)
So it carries out the test again and again, which
in a large dataset is naturally very inefficient.

In other problems, you would rarely get away
with sloppiness over whether form 1 or form 2
should be used. The FAQ at
http://www.stata.com/support/faqs/lang/ifqualifier.html
explains how you could get bitten.

If the issue were that a set of observations
should be non-missing on all variables specified,
something like

. egen rmiss = rmiss(<varlist>)
. count if rmiss
. if r(N) == 0 <command>

is one way to do it.

P.S. In Stata 7,

. count if missing(mpg)
. if r(N) == 0 { su mpg }

Nick
n.j.cox@durham.ac.uk

de la Garza, Adrian

> Can anyone tell me if it's possible to tell Stata to ignore missing
> observations when computing averages, etc.? I think that if
> there is a
> missing value in the observations considered, the average
> computed would
> be then missing too... and I need it to intelligently
> choose how many
> observations to use depending on whether observations are available,
> etc.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index