Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Bug in -use- or -if- ?


From   "Joseph Coveney" <jcoveney@bigplanet.com>
To   "Statalist" <statalist@hsphsun2.harvard.edu>
Subject   st: Re: Bug in -use- or -if- ?
Date   Thu, 5 Feb 2009 10:59:12 +0900

Sergiy, I believe that your colleague is correct in how Stata interprets the
underscore variable, _n.  The help file states, " _n contains the number of
the current observation."  And it also appears to qualifie -if- according to
the same criterion while -use- reads data in from a dataset file.  If you're
loading a dataset from a disc file, _n is incremented as each observation's
record is read into memory.  So, -if _n <= 37- will work, because _n will
increase from zero to 37 as further records are loaded and _n == 1, 2, 3, etc.
tests True as being less than 37.  But, starting from -clear- (with _n equal
to zero), -if _n > 37- will never be True, because, as each candidate
observation record is read into memory for testing of the condition, _n would
only ever be equal to one, which is never greater than 37.  And because the
condition tests as False at each test of -if _n > 37-, each successive
candidate record in the file on disc will be rejected--no observation records
will ever be read into memory.

The same holds for -if inrange(_n, 2, 20)-; starting with _n equal to zero
(empty in-memory dataset), _n will only be at most one as each successive
record is read and tested for the truth of -inrange(_n, 2, 20)-.   _n will
never be between 2 and 20 and so each successive candidate record will be
rejected, leaving a dataset in memory of zero observations at the end.

Joseph Coveney

Sergiy Radyakin wrote:

in a different thread Dan Blanchette asked about cooperation of -in-
and -if-. I have asked myself a slightly different question whether
specifying if-conditions can always substitute for in-conditions: e.g.
instead of "in #A/#B" one can type "if inrange(_n,#A,#B)".

There seems to be a bug in -use- that get's confused by such a
condition. My colleague has suggested that this might happen because
Stata will qualify _n according to the current dataset in memory, but
qualify if- for the dataset during the load. I was able to come up
with an example where it get's confused unconditionally on the current
dataset. It seems that the conditon "larger" is not evaluated properly
in this case.

*** bug with use ... if F(_n)
*** N(auto.dta)=74

sysuse auto, clear
local fullauto `r(fn)'

use `"`fullauto'"' in 1/37, clear
count
assert (_N==37)

use `"`fullauto'"' in 38/74, clear
count
assert (_N==37)

use `"`fullauto'"' if _n<=37, clear
count
assert (_N==37)

use `"`fullauto'"' if _n>37, clear
count
assert (_N==37)

It is hard to understand what Stata will think of _n while loading
data, but it is definitely not the observation number.
Strangely the condition inrange(_n,1,20) loads 20 (twenty)
observations, but inrange(_n,2,20) loads 0 (zero).

So if you ever try to work with large datasets in smaller portions,
slice them with an in-condition, not an if-condition!

Stata MP for Windows, v10.1.551 born 02 Feb 2009, (currently latest.
This recent update brings some very welcomed changes: thank you!)

Best regards, Sergiy Radyakin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index