[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
David Kantor <dkantor@jhu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
RE: st: RE: survey completion "flag" |

Date |
Tue, 22 Feb 2005 10:18:01 -0500 |

At 05:21 PM 2/18/2005 +0000, Nick Cox wrote:

I'd like to add to this by offering some conceptual tools.No. And in fact that offers a much cleaner solution to the missing if any missing problem. gen max = <safe value> foreach v of local varlist { replace max = cond(`v' > max, `v', max) } Nick n.j.cox@durham.ac.uk Nick Winter > Ah yes, I'd forgotten that. > > So max(a, b) is NOT the same as cond(a>b,a,b). > At 05:01 PM 2/18/2005 +0000, you wrote: > >The second problem is well identified. > >Thanks. > > > >The first problem is no problem. > > > >max(42, .) is 42, so initialising to > >missing is safe. > > > >This perhaps surprising feature can > >be rationalised as follows. Given > >arguments of > > > >-1, 0, 1, 2.71828, 3.14159, 42, . > > > >would you really want a -max()- function > >to return missing? It depends on the problem, > >but it can be forced the other way: > > > >gen max = <safe value> > >foreach v of local varlist { > > replace max = cond(`v' == ., ., max(`v', `max')) if !mi(max) > >} > > > >Nick > >n.j.cox@durham.ac.uk

The max of...

-1, 0, 1, 2.71828, 3.14159, 42, .

(that is, the general interpretation, not necessarily the value of Stata's max function) depends on what you mean by "missing". If the missing value represents some unknown value, then the max is unknown as well. On the other hand, the missing value in that list may also be interpreted as "vacuous" -- there's really nothing there. That is the interpretation that Stata's max function takes. Similarly, the difference between,

gen t = x + y + z

and

egen u = rsum(x y z)

is that, in the former, missing values are taken as unknown values, and in the latter, they are vacuous. (Actually, they are taken as zeroes, but "vacuous" can be thought of as a universal identity element in operations.)

Unfortunately, Stata is full of these kinds of inconsistencies, which we must be aware of. My though is that it might have been useful to have distinct missing values to distinctly represent unknown and vacuous, and to have all functions and expression evaluations regard these values accordingly. Thus, for example, if we suppose that .v stands for vacuous, then I would want

max( -1, 0, 1, 2.71828, 3.14159, 42, .v)

to be 42, but

max( -1, 0, 1, 2.71828, 3.14159, 42, .)

to be sysmiss (.).

I hope some of you find this useful or interesting.

-- David

David Kantor

Institute for Policy Studies

Johns Hopkins University

dkantor@jhu.edu

410-516-5404

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: RE: survey completion "flag"***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: programming help** - Next by Date:
**st: Re: graph speed** - Previous by thread:
**RE: st: RE: survey completion "flag"** - Next by thread:
**Re: st: RE: survey completion "flag"** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |