[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Taking averages, etc. |

Date |
Wed, 17 Dec 2003 10:55:49 -0000 |

Others have replied pointing out that Stata does this already. A related question, perhaps not of much practical interest, and I guess not Adrian's question, but nevertheless a small puzzle testing grasp of Stata technique, is how to specify that an average will be calculated only if all the values of interest are non-missing. A single missing value would be enough to instruct Stata not to calculate. Here is one way to do it. It works interactively in Stata 8, and could be automated in a program. . u auto . count if mi(rep78) . if r(N) == 0 su rep78 Stata uses the count left behind by -count- in r(N). As r(N) is in fact 5, nothing is done. . count if mi(mpg) . if r(N) == 0 su mpg In this case r(N) is indeed 0 and the calculation is done. Non-programmers should note the crucial difference in principle between 1. if <condition> <command> and 2. <command> if <condition> Form 1 carries out _one_ test of the <condition> supplied. If it is true, <command> is carried out, but not otherwise. Form 2 carries a test of the <condition> supplied for _every_ observation specified and then carries out <command> for the observations for which it is true. For this problem, form 1 is better. As it happens, . count if mi(mpg) . su mpg if r(N) == 0 is not only legal, but produces the correct result. What happens is that Stata looks at the condition if r(N) == 0 and says in turn: is this true for observation 1? for observation 2? for observation 3? and so forth. As it happens, r(N) == 0 is nothing to do with any of these observations in particular, but Stata has little notion of irrelevance, and irrelevance doesn't make a condition false. (You could even use a tautology like -if 2 == 2- which Stata would then test for every observation.) So it carries out the test again and again, which in a large dataset is naturally very inefficient. In other problems, you would rarely get away with sloppiness over whether form 1 or form 2 should be used. The FAQ at http://www.stata.com/support/faqs/lang/ifqualifier.html explains how you could get bitten. If the issue were that a set of observations should be non-missing on all variables specified, something like . egen rmiss = rmiss(<varlist>) . count if rmiss . if r(N) == 0 <command> is one way to do it. P.S. In Stata 7, . count if missing(mpg) . if r(N) == 0 { su mpg } Nick n.j.cox@durham.ac.uk de la Garza, Adrian > Can anyone tell me if it's possible to tell Stata to ignore missing > observations when computing averages, etc.? I think that if > there is a > missing value in the observations considered, the average > computed would > be then missing too... and I need it to intelligently > choose how many > observations to use depending on whether observations are available, > etc. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: RE: Taking averages, etc.***From:*Richard Williams <Richard.A.Williams.5@nd.edu>

**st: Efficiency: was RE: Taking averages, etc.***From:*Allan Reese <R.A.Reese@hull.ac.uk>

**References**:**st: Taking averages, etc.***From:*"de la Garza, Adrian" <ADelagarza@imf.org>

- Prev by Date:
**st: RE: Re: evaluating an expression in -forvalues-** - Next by Date:
**st: What is a 200% improvement?** - Previous by thread:
**Re: st: Taking averages, etc.** - Next by thread:
**st: Efficiency: was RE: Taking averages, etc.** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |