I agree that's a possible pitfall. I've done what I can
under this heading by publishing a note on how it is a good
idea to use -inrange(x, 42, .)-.
I can't resist adding that users who split a measured control
in that way are ignoring key information and producing rather
indirect analyses in any case. But that's not to contradict
your good point.
Austin Nichols
Nick--
Once again, I agree mostly--but on the issue of how often a qualifier
such as
... if x > 42
can lead to an incorrect analysis, I think there are many dangers of
this sort, and many users fall prey to them. Users of other software
fall under other wheels, and it seems that some casualties are
unavoidable.
As an example, consider someone using data on individuals who wishes
to assess some effect of x on y that varies by age category, but exact
age is not asked of individuals under 15 and is missing (y and x are
nonmissing everywhere). The person runs
reg y x if age > 45
reg y x if age <= 45
and thinks they have the effect for older and younger individuals, but
in fact the first command includes those over 45 and those under 15.
This is a somewhat silly example, and of course users should check
their own variables, and not expect Stata to do it for them. But it
illustrates a type of error that I have seen in many users' code, most
often in -generate- -egen- and -replace- and commands immediately
preceding those (e.g. -sum- or -_pctile-) but also in estimation
commands. There is no guarantee that users would stop making this
type of mistake even given an informative warning message about
missings, but it does seem like a friendly reminder from Stata could
save a lot of analyses.
On Jan 9, 2008 10:27 AM, Nick Cox <[email protected]> wrote:
>
=======================================================================
> We're brandishing impressions and prejudices at each other. But
> I can't see that the situation is anywhere near as bad as Tom
> seems to fear. If missings are present, then almost always they
> would not actually be included in modelling, summary statistics
> or graphs even if you accidentally request that they be included
> by virtue of a condition such as
>
> ... if x > 42
>
> What's most evident is a data management request in which you
> get missings shown when you didn't want them, but there's no
> tragedy there, just an irritation.
> ======================================================================
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/