Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: update to cut - a reply

From   [email protected] (William Gould)
To   [email protected]
Subject   Re: st: update to cut - a reply
Date   Thu, 08 Aug 2002 14:48:30 -0500

Michael Hills <[email protected]> wrote, 

> 1. Is it acceptable to users that Stata corp should make major changes
>    in functionality to a widely used command, between versions,
>    without warning or discussion?  [...]

The answer is, "Of course it is not acceptable".

Our official policy is, 

    We do not change functionality or syntax except under version control.
    We do fix bugs and add new functionality.

That is the policy we are supposed to follow.  On 8may2002, we updated
-egen-'s -cut(), at()- function, and we violated that policy.  It was a
mistake and we will issue an update tomorrow to fix that.  We learned about
our mistake yesterday, 7aug2002.

The situation with -cut()- is indeed confusing.  Let me go over it.

    1.  01may1999.  David Clayton and Michael Hills develop the -cut()-
        function in "Recoding variables using grouped values", STB-49.

    2.  15dec2000.  The -cut()- function is made official in Stata 7.

    3.  08may2002.  StataCorp introduces an update to "fix" the cut 
        function after receiving a complaint from a user.

    4.  07aug2002.  Michael Hills points out that "fix" has changed 
        functionality and treats missing values oddly.

    5.  07aug2002.  Jean Marie Linhart of StataCorp defends fix.

    6.  07aug2002.  Others join in to agree with Michael Hills.

    7.  07aug2002.  Jean Marie Linhart surrenders.

There are actually two problems with the 08may2002 "fix", and I can think both
are bothering Michael Hills.  The first is the treatment of missing values,
which has received emphasis on the list, and on which we all agree:  Missing
should mean missing.

The second is on the treatment of nonmissing values larger than the top 
cut point.  In the original article, Clayton and Hills wrote inelegantly 
about what the function does, and in what they wrote, one would not suspect 
that nonmissing values above the final cutpoint would be mapped to missing.
Later in the article, however, in an example, they make it clear that turning
those nonmissing values into missing is exactly what they intend.

In adopting -cut()-, StataCorp wrote its own inelegant description of the 
function, and in that description, one would not suspect that nonmissing 
values above the final cutpoint would be mapped to missing.

Then, much later, StataCorp received a complaint from a user who said, "Look
at what -cut()- is doing.  Read your description.  It's broken."  Technical
Services did exactly that and agreed.  It was turned over to Jean Marie to be

Jean Marie fixed the problem and, in the process, added her own little bit to
the mix, having to do with the treatment of missing values, on which we 
have all focused (and on which we all now agree).

That, however still leaves the big problem:  We did change the behavior of
cut() on 08may2002, even though we thought we were just fixing a bug.


    1.  we will return -cut()- to its original behavior.

    2.  We will consider whether our retracted change is an improvement.
        If we determine that it is, we will either 

             a.  Add an option to -cut()- which, if not specified, 
                 maintains the prior-to 08may2002 behavior, or 

             b.  In the next release of Stata, change -cut()-'s behavior, 
                 but under version control, so that if you set the old
                 version, you get the old behavior.

I want to emphasize:  it was not our intention to change functionality.

-- Bill
[email protected]
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index