[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: update to cut - a reply
> Michael Hills <firstname.lastname@example.org> wrote:
> I was disturrbed to find that in the latest ado update the cut
> function of egen has changed how it deals with missing values.
> In the version before the update, when you cut a variable with missing
> values, these were coded as missing in the new variable. Now they are
> coded with the upper limit specified in cut.
> I can see no logic in this and I presume it is an error which has been
> introduced during the latest update of cut. There is no change in the
> help file for egen.
and Jean Marie Linhart of Statacorp replied
> The change to -egen, cut- was intentional; it was pointed out by a
> user that its behavior did not match the documentation. I concurred
> and modified the function to better match the documentation.
> Previously when -egen, cut- was used with the -at- option, the
> (ascending) list of numbers are intended left hand endpoints for
> subdividing the data. However any member of the input data greater
> than or equal to the largest value in -at- got mapped to missing.
> This contradicted the notion that the list of numbers was intended
> as left hand endpoints.
> I modified -egen, cut- so that the values of the input data that are
> greater than or equal to the largest value in -at- are mapped to the
> largest value in the -at- list.
> Since missing values are greater than numerical values, this resulted
> in the behavior that now missing values are mapped to the largest
> value specified in -at-. This makes sense to me given that the values
> in -at- are intended as left hand endpoints.
> If the user wishes to exclude missing values, it is easy to accomplish
> this using the -if- or -in- options. For example,
> . egen newx = cut(x), at(2, 4, 6) if x != .
> The documentation for -egen, cut- says nothing about what happens
> to missing values, but perhaps it should. I'll get that done.
There are a few points to make about this update:
1. Is it acceptable to users that Stata corp should make major changes
in functionality to a widely used command, between versions,
without warning or discussion? I have even written a book on
stata which is now wrong in several places.
2. In this case I believe the change to be wrong. To replace a missing
value with (say) 45 simply because . > 45 is absurd. After using
cut in this way, tabulating the new variable would show nonsense. One
should not have to use `if' to prevent a command producing nonsense.
3. The original intention of cut was to exclude values outside the
number list, so
egen newvar = cut(var), at(25(5)45)
exluded values up to 25 and from 45 and above. These values were
excluded by coding them as missing in the new variable. This seems
to me to be quite acceptable usage.
4. The original documentation of cut made this quite clear. Since the
complaint was that the documentation did not match the command,
perhaps it would have been better to have updated the help file to
match the command, rather than the other way round.
* For searches and help try: