Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: latest update of cut (Policy on handling missing values)

From   "Jens M. Lauritsen" <>
Subject   Re: st: latest update of cut (Policy on handling missing values)
Date   Thu, 08 Aug 2002 10:54:33 +0200

Changing behaviour in relation to well known and widely used functions is a
"no-no" in my opinion.

So I am happy to see:

"I've been convinced that this is the better interpretation of missing for
-egen, cut-.  Missing values will be mapped to missing; non missing values
larger than that of the largest value in -at- will map to the largest value
in the -at- list"

The problem of cut is an aspect of this. I used cut in the original
(MHills) version and 
then decided that I would like to have labels of both ends of the
intervals, e.g. (1-20)(21-40) not the original cut of (1-)(21-) . So I made
my "own" version cutjl based on MHills cut.

My suggestion when teaching Stata (and other programs) is always: Find some
core functions you know well and stick to them. Disregard automatic updates
for crucial parts until you are dissatisfied with the old one. Better
having an old and somewhat simpler version than a new one with uncertainty.

Apparently cut was taken into stata (I did not know) as an internal command
which is reasonable and now with changed behaviour in relation to missing
(and back to original behaviour soon). 

I would argue that if at all anything should change of internal programs in
Stata it should be in the other direction. I.e. having more functions to
actually regard missing as missing. The internal storage of missing values
as a very large number is irrelevant to users, in particular to new
inexperienced colleagues. A technicality like this is part of the reason
why some clinicians consider Stata a very difficult program to work with.
(The second is the lack of quest system to handle all situations, and the
third the difficulty of getting quick from results to tables in publications).

So : Stata policy should be: If we can at all implement handling of missing
as such without the user having to write strange sentences like (......if
v1 != .) then do it. Not the other way around - in particular not for
commands or functions which behaved sensibly before.

Other stats programs such as SPSS (and SAS ?? I am not sure) can have
several values defined as missing value. In some situations we wish to
separate missing from irrelevant and this is currently not possible with
the STATA dataformat. So add more than one value to the development
strategy, please.

Jens M. Lauritsen, Consultant,MD. PhD.
Accident Analysis Group, Odense University Hospital, 
Odense, Denmark e-mail:
Initiator of EpiData, see for DataEntry in windows (EpiInfo
V6 principle)
For news on Epidata, sign on at:
Phone +45 6541 2293 Mobile +45 2173 2293 fax: +45 6613 6581 
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index