Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: New package -ifwins- "if wins!" on SSC to subset data by "if" exp first then "in" range


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: New package -ifwins- "if wins!" on SSC to subset data by "if" exp first then "in" range
Date   Wed, 18 Mar 2009 18:40:25 -0000

Dan was kind enough to let me see an earlier version of this privately.
I encouraged him to post to get feedback. 

I also expressed some scepticism about his project. I don't want to deny
that occasionally -if- and -in- don't work in the way some users want,
but I don't see the solution as trying to subvert the way they work. 

Dan's problem is that the behaviour of -if- and -in- are so deep down
that no user can do more than write a wrapper like this to change their
behaviour temporarily -- and a very good thing too. Just imagine the
extraordinary threads likely if the behaviour of -if- or -in- was
tuneable. 

The Catch-22 of -ifwins- is this. With some effort, Dan can get some
commands to behave the way he wants them, and he is careful that this
never changes (including messes up) your dataset. But of course the rest
of Stata, including anything that might change your data, is unaffected.
(Otherwise put, any changes you make under -ifwins- are not permanent.)
Positively, this gives some flexibility to those who want it.
Negatively, two ways that -if- and -in- behave is one more than some
people will want, especially if they have to keep changing their view.
(I'd want to keep all learners under my wing firmly ignorant of
-ifwins-.) 

As has often been pointed out, intellectual skill grows according to
what you can do without thinking much about it. With some experience
-if- and -in- just become intuitive so that you are rarely really
surprised by what they do. The experienced Stata user just knows that 

. list if foreign == 1 in 1/10

does not necessarily mean "show me the first 10 foreign cars" -- or, at
worst, if they temporarily forget, the same experienced Stata user can
quickly think of several ways to get that output. In fact, although Dan
does not mention it 

. browse if foreign == 1 

is a pretty direct solution to his leading example problem. It does not
have exactly the same consequence, but you can look at what you want and
then close the window. In fact, to many users that way of working is
likely to seem much more intuitive than learning -ifwins-. 

A price of any language is that even some simple things may take a few
lines -- the only way to avoid this is a language with thousands and
thousands of commands that would be unattractive and unlearnable. Even
StataCorp has learnt this the hard way. Long-time users will remember
the old -for- from a few versions back. In essence, -forvalues- and
-foreach-, although they typically imply longer code than did -for-,
give users the flexibility they really need without the extraordinary
bugs and misunderstandings that bit -for- users. (Note to those who
joined in Stata 9 or 10: this -for- was nothing to do with Mata's -for-,
and not like it.) 

By the way, I think it does no harm to think that in Stata -in- subsets
the dataset before -if-, but it's wrong in principle. -if- and -in- are
orthogonal. What you get with both is an intersection of sets, possibly
empty, and as with intersections there is no sense in which the
intersection of A and B is _in principle_ a matter of identifying one
set, say A, before another, say B. This is a fine distinction, but I
think it's the correct one.  

Nick 
n.j.cox@durham.ac.uk 

Dan Blanchette

Thanks to Kit Baum, a new package -ifwins-
is now available for download on SSC.

Description
-ifwins- is a prefix command that runs most any Stata command that does
not modify
the dataset in memory (e.g. generate, replace, etc.).  -ifwins- will
have "if" subset
the dataset before "in" subsets the dataset.  This is the opposite of
what happens
when both "if" and "in" are used in the same Stata command.  For
example, the following
code will first subset the dataset to the first 10 observations and then
subset
the dataset to the specified condition:

  . sysuse auto
  . list if foreign == 1 in 1/10

Since the auto.dta dataset is sorted by the variable foreign, the above
code will
not list any observations because in the first 10 observations foreign
== 0 . 
So, "if" looses and "in" wins when "battling" over which one subsets the
dataset
before the other one does.

If you want to run a Stata command on a certain number of observations
when a
certain condition exists, you would have to:

  . preserve
  . keep if foreign == 1
  . list in 1/10 make turn weight
  . restore

or use -ifwins- as a prefix to the desired Stata command:

  . ifwins if foreign == 1 in 1/10 :  list make turn weight

The above will list the first 10 observations of when the variable
foreign is
equal to 1 (one).  So now "if" wins!...but "in" is still helpful.

To install -ifwins-:
. ssc install ifwins

Let me know if you have any questions.

Thanks to Roy Wada for helping me better document -ifwins-.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index