[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
[email protected] (William Gould, StataCorp LP) |

To |
[email protected] |

Subject |
Re: st: My last word on strange world |

Date |
Fri, 11 Jan 2008 10:35:49 -0600 |

I apolgize in advance for dragging on the conversation about missing values, but my last post was too philosphical and I want to show an concrete example illustrating what I think the problem is with the other proposals, which is not to say there is not merit in the other proposals. In reponse to my reference to the law of the excluded middle -- P or not-P is true -- Jeph Herrin <[email protected]> replied, > Not at all. You are presuming that > > ![42 > (undefined)] == [42 !> (undefined)] Yes I am. Most programmers and computer users would assume the statements !(x>y) and (x<=y) are equivalent. If it is difficult to explain to users that (income>10000) includes missing values, try explaining that !(x>y) and (x<=y) are not the same statement when missing values are involved. Svend Juul <[email protected]> wrote, > The decision made is perfectly logical, but the following alternative is > equally logical and much more in line with the expectations of ordinary > users: > > [...] > - let (x>100) evaluate to false if x is missing > - let (x==100) evaluate to false if x is missing > - let (x<100) evaluate to false if x is missing It therefore follows x>=100 evaluates to false if x is missing and thus the statements (x>=100) and (x<100) are both false and, as I asked about Jeph's comment, is that really in line with the expectations of ordinary users? Let's look at an example. Consider . keep if income>=baseincome & age>=minage versus . drop if income<baseincome | age<minage Under the proposed change, different observations will appear in the resulting data. They differ because statements like income<baseincome and its opposite income>=baseincome simultaneously evaluate to FALSE in the presence of missing values, and -keep- and -drop- perform the opposite actions. To appreciate how difficult this logic is to understand, try explaining to yourself exactly how the two datasets will differ. Try answering the following question: Under what conditions will the second dataset have more observations than the first? I predict you will need to pull out pencil and paper, write down the 2^4 = 16 missing-value patterns, and think carefully about how each of the expressions would evaluate. To do that, you need to know one more thing: Under Svend's proposal, x==. would evaluate to TRUE if x is missing, so assume x>=. would evaluate to TRUE as well. I repeat what Joseph Coveney <[email protected]> already said: each implementation results in its own gotcha. I admit that how Stata treats missing values -- as if they were infinity -- has a gotcha, and I know both Jeph and Svend understand that other proposals have a gotcha, too. They believe that the gotchas in other proposals will be easier for "ordinary" users to understand. The above is a counterexample. The last time we at StataCorp thought carefully about this problem, we learned that it is not enough to write down the statements for which the proposal better meets expectations. You have to go searching for the statements that do the unexpected and then ask, "How easy will that be to explain to users?" -- Bill [email protected] * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: My last word on strange world***From:*Richard Williams <[email protected]>

- Prev by Date:
**Re: st: My last word on strange world** - Next by Date:
**SV: SV: SV: st: From probit to dprobit to interpretation** - Previous by thread:
**Re: st: My last word on strange world** - Next by thread:
**Re: st: My last word on strange world** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |