[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: My last word on strange world

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: My last word on strange world
Date   Fri, 11 Jan 2008 18:53:22 -0000

I should have spelled out what I meant by impracticable. 

Sorting a variable that includes missings so that the missings go to the
end of the data is a standard Stata programming trick, at least among
StataCorp people and those who steal their ways of working. So there is
no way it will be outlawed. 

But if you want to have this as your own rule, let it be so: 

program jephsort 
	version 8 
	syntax varlist [in] [, * ] 

	qui foreach v of local varlist { 
		count if missing(`v') 
		if r(N) { 
			di as err "no sort; missings present on `v'" 
			exit 498 

	sort `varlist' `in', `options' 

Of course, using this would not stop other programs, including official
Stata's, sorting how they please. 

Similarly, you can decide for yourself that -99 is missing if you want,
but clearly you will have to ensure yourself that it is excluded from
anyplace it doesn't really belong, and you won't stop the rest of Stata
using . as it pleases. 

SamL: The thread's not over until it's over! 

[email protected] 

Jeph Herrin

Nick Cox wrote:
> NJC>>> Are you asking that 42 > . and 42 < . _both_ return FALSE? 

Yep. Missing means empty set, and you can't compare a real
number to the empty set. 42 is neither larger nor smaller than
the elements of the empty set, so these return FALSE.

> That . == . returns FALSE? 

No, the empty set equals itself. This should return TRUE.

> If a user sorts on a variable which has missing values, then
> Stata could return an error message saying that the variable
> cannot be sorted because of missing values. The user can then
> repair or truncate their data so that sorting makes sense.
> NJC >>> Sorry, but I think that's the most impracticable suggestion 
> so far in this thread, with some stiff competition! 

I disagree. Replacing missings with -99 or 10E23 is a perfectly
practical, not to mention well-tested, approach. I like knowing
when data is missing, but if I want to treat missing as a number
it should be up to me how I choose to do so.

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index