[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Joseph Coveney" <jcoveney@bigplanet.com> |

To |
"Statalist" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Re: Bug in -use- or -if- ? |

Date |
Thu, 5 Feb 2009 10:59:12 +0900 |

Sergiy, I believe that your colleague is correct in how Stata interprets the underscore variable, _n. The help file states, " _n contains the number of the current observation." And it also appears to qualifie -if- according to the same criterion while -use- reads data in from a dataset file. If you're loading a dataset from a disc file, _n is incremented as each observation's record is read into memory. So, -if _n <= 37- will work, because _n will increase from zero to 37 as further records are loaded and _n == 1, 2, 3, etc. tests True as being less than 37. But, starting from -clear- (with _n equal to zero), -if _n > 37- will never be True, because, as each candidate observation record is read into memory for testing of the condition, _n would only ever be equal to one, which is never greater than 37. And because the condition tests as False at each test of -if _n > 37-, each successive candidate record in the file on disc will be rejected--no observation records will ever be read into memory. The same holds for -if inrange(_n, 2, 20)-; starting with _n equal to zero (empty in-memory dataset), _n will only be at most one as each successive record is read and tested for the truth of -inrange(_n, 2, 20)-. _n will never be between 2 and 20 and so each successive candidate record will be rejected, leaving a dataset in memory of zero observations at the end. Joseph Coveney Sergiy Radyakin wrote:

in a different thread Dan Blanchette asked about cooperation of -in- and -if-. I have asked myself a slightly different question whether specifying if-conditions can always substitute for in-conditions: e.g. instead of "in #A/#B" one can type "if inrange(_n,#A,#B)". There seems to be a bug in -use- that get's confused by such a condition. My colleague has suggested that this might happen because Stata will qualify _n according to the current dataset in memory, but qualify if- for the dataset during the load. I was able to come up with an example where it get's confused unconditionally on the current dataset. It seems that the conditon "larger" is not evaluated properly in this case. *** bug with use ... if F(_n) *** N(auto.dta)=74 sysuse auto, clear local fullauto `r(fn)' use `"`fullauto'"' in 1/37, clear count assert (_N==37) use `"`fullauto'"' in 38/74, clear count assert (_N==37) use `"`fullauto'"' if _n<=37, clear count assert (_N==37) use `"`fullauto'"' if _n>37, clear count assert (_N==37) It is hard to understand what Stata will think of _n while loading data, but it is definitely not the observation number. Strangely the condition inrange(_n,1,20) loads 20 (twenty) observations, but inrange(_n,2,20) loads 0 (zero). So if you ever try to work with large datasets in smaller portions, slice them with an in-condition, not an if-condition! Stata MP for Windows, v10.1.551 born 02 Feb 2009, (currently latest. This recent update brings some very welcomed changes: thank you!) Best regards, Sergiy Radyakin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: Bug in -use- or -if- ?***From:*Sergiy Radyakin <serjradyakin@gmail.com>

**References**:**st: Bug in -use- or -if- ?***From:*Sergiy Radyakin <serjradyakin@gmail.com>

- Prev by Date:
**Re: st: time efficient way to choose variables** - Next by Date:
**Re: st: SAS proc autoreg and Stata arima** - Previous by thread:
**st: Bug in -use- or -if- ?** - Next by thread:
**Re: st: Re: Bug in -use- or -if- ?** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |