[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: RE: RE: RE: RE: statalist-digest V4 #2935 - strange world

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: RE: RE: RE: RE: statalist-digest V4 #2935 - strange world
Date   Wed, 9 Jan 2008 16:14:10 -0000

I agree that's a possible pitfall. I've done what I can 
under this heading by publishing a note on how it is a good 
idea to use -inrange(x, 42, .)-. 

I can't resist adding that users who split a measured control 
in that way are ignoring key information and producing rather 
indirect analyses in any case. But that's not to contradict 
your good point. 

Austin Nichols

Once again, I agree mostly--but on the issue of how often a qualifier
such as
... if x > 42
can lead to an incorrect analysis, I think there are many dangers of
this sort, and many users fall prey to them.  Users of other software
fall under other wheels, and it seems that some casualties are

As an example, consider someone using data on individuals who wishes
to assess some effect of x on y that varies by age category, but exact
age is not asked of individuals under 15 and is missing (y and x are
nonmissing everywhere). The person runs
 reg y x if age > 45
 reg y x if age <= 45
and thinks they have the effect for older and younger individuals, but
in fact the first command includes those over 45 and those under 15.
This is a somewhat silly example, and of course users should check
their own variables, and not expect Stata to do it for them.  But it
illustrates a type of error that I have seen in many users' code, most
often in -generate- -egen- and -replace- and commands immediately
preceding those (e.g. -sum- or -_pctile-) but also in estimation
commands.  There is no guarantee that users would stop making this
type of mistake even given an informative warning message about
missings, but it does seem like a friendly reminder from Stata could
save a lot of analyses.

On Jan 9, 2008 10:27 AM, Nick Cox <[email protected]> wrote:
> We're brandishing impressions and prejudices at each other. But
> I can't see that the situation is anywhere near as bad as Tom
> seems to fear. If missings are present, then almost always they
> would not actually be included in modelling, summary statistics
> or graphs even if you accidentally request that they be included
> by virtue of a condition such as
> ... if x > 42
> What's most evident is a data management request in which you
> get missings shown when you didn't want them, but there's no
> tragedy there, just an irritation.
> ======================================================================

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index