Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: What's the added value of having -in- subset the data before -if- does?


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: What's the added value of having -in- subset the data before -if- does?
Date   Wed, 4 Feb 2009 18:02:33 -0000

I've never used SAS and in any case would bow to your understanding of
it. 

But I think your opening is the wrong way to think about how Stata
works. In Stata -if- and -in are orthogonal; there is no logical sense
in which one has priority over the other. 

(Whatever happens precisely at the implementation level, it doesn't
bite. Conversely, if it did bite, the priority would need to be
documented.) 

It's just as if you ask for the intersection of two sets A and B; the
answer does not depend on which set you look at first. 

Thus -in 1/10- refers to absolute observation numbers regardless of what
values are in them, and -if foreign == 1- refers to values, regardless
of what observations they are in. (Clearly, both sets could be null,
even the first, if you had no observations, not to say their
intersection could be null.) 

But that said, it's clear what you want.

Flippantly put, you want -in 1/10- to mean the first 10 you care about
(as otherwise specified) to occur in the data. Fair enough. 

As your examples show, one way to achieve that is to sort them to the
front; then you can pick them off. 

A better way is to keep track of where the observations are, as is
achieved by your use of -sum()-. Although the trickery with -sum()- is
clever, I think I'd always find it faster to use -edit if foreign == 1-
and look at the first whatever (or the last whatever). 

I once tried mimicking -in- in a program without using it directly, and
found it trickier than I wanted, because of the need to support f, F, l,
L and negative observation numbers. I forget the details but they were
not as easy as I hoped. 

Nick 
n.j.cox@durham.ac.uk 

Dan Blanchette

Have you ever wanted to list a selection of observations based on a 
condition but only list say a subset of 10 obs of that condition? 
If so, perhaps you've been frustrated with the fact that:
. sysuse auto
. list  if foreign == 1 in 1/10
lists no observations because in the first 52 observations foreign == 0.
The -in- subsets the data before the -if- condition subsets the data. 
This is the opposite in SAS:

/* WHERE subsets the data before OBS subsets the data */

PROC PRINT DATA= SASHELP.SHOES(WHERE=(STORES < 10)  OBS = 10);
RUN;

So, the above code lists the first 10 observations where (STORES < 10).

I can't think of any situation where I would want to know how many
times a certain condition exists in the first X observations.  Do others
ever need to know that?

I figured out a solution where Stata will subset the data to the 
condition and then only list the range of observations I'm interested
in:
. list  if sum((foreign == 1)) <= 10

The "(foreign == 1)" inside the sum() creates a value equal to 1 when
the 
condition is true and then sum() creates a running sum of that.  You can

use the sum() function to subset your data for other Stata commands.

You could get a range of observations as well:

. list  if inrange(sum((foreign == 1)),2,11)

I may decide always to use this since:

. list  if sum((foreign == 1)) <= 100

will also work despite the fact there aren't 100 observations in the
data.
I'll never again get the error message:
  Obs. nos. out of range
  r(198);

My previous solution was to:

preserve
keep  if foreign == 1
local nobs = 10
if  _N  < `nobs'  local nobs = _N
list in 1/`nobs'
restore


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index