Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: survey completion "flag"


From   David Kantor <dkantor@jhu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   RE: st: RE: survey completion "flag"
Date   Tue, 22 Feb 2005 10:18:01 -0500

At 05:21 PM 2/18/2005 +0000, Nick Cox wrote:
No. And in fact that offers a much cleaner
solution to the

missing if any missing

problem.

gen max = <safe value>
foreach v of local varlist {
        replace max = cond(`v' > max, `v', max)
}

Nick
n.j.cox@durham.ac.uk

Nick Winter

> Ah yes, I'd forgotten that.
>
> So max(a, b) is NOT the same as cond(a>b,a,b).

> At 05:01 PM 2/18/2005 +0000, you wrote:
> >The second problem is well identified.
> >Thanks.
> >
> >The first problem is no problem.
> >
> >max(42, .) is 42, so initialising to
> >missing is safe.
> >
> >This perhaps surprising feature can
> >be rationalised as follows. Given
> >arguments of
> >
> >-1, 0, 1, 2.71828, 3.14159, 42, .
> >
> >would you really want a -max()- function
> >to return missing? It depends on the problem,
> >but it can be forced the other way:
> >
> >gen max = <safe value>
> >foreach v of local varlist {
> >         replace max = cond(`v' == ., ., max(`v', `max')) if !mi(max)
> >}
> >
> >Nick
> >n.j.cox@durham.ac.uk
I'd like to add to this by offering some conceptual tools.

The max of...
-1, 0, 1, 2.71828, 3.14159, 42, .
(that is, the general interpretation, not necessarily the value of Stata's max function) depends on what you mean by "missing". If the missing value represents some unknown value, then the max is unknown as well. On the other hand, the missing value in that list may also be interpreted as "vacuous" -- there's really nothing there. That is the interpretation that Stata's max function takes. Similarly, the difference between,
gen t = x + y + z
and
egen u = rsum(x y z)

is that, in the former, missing values are taken as unknown values, and in the latter, they are vacuous. (Actually, they are taken as zeroes, but "vacuous" can be thought of as a universal identity element in operations.)

Unfortunately, Stata is full of these kinds of inconsistencies, which we must be aware of. My though is that it might have been useful to have distinct missing values to distinctly represent unknown and vacuous, and to have all functions and expression evaluations regard these values accordingly. Thus, for example, if we suppose that .v stands for vacuous, then I would want
max( -1, 0, 1, 2.71828, 3.14159, 42, .v)
to be 42, but
max( -1, 0, 1, 2.71828, 3.14159, 42, .)
to be sysmiss (.).

I hope some of you find this useful or interesting.
-- David

David Kantor
Institute for Policy Studies
Johns Hopkins University
dkantor@jhu.edu
410-516-5404
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index