[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Sergiy Radyakin <serjradyakin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Re: Bug in -use- or -if- ? |

Date |
Thu, 5 Feb 2009 10:18:56 -0500 |

Dear Joseph, I agree that this might be a plausible explanation of what's going on, but this is hardly what one would expect. E.g. when I type a variable name in {use filename.dta if age>18}, I mean the variable age in the dataset filename.dta, not in the current dataset in memory. Following the same logic, _n must also refer to the observation number in the file. Otherwise, what would be a valid example of using _n in an if-condition after -use-? (if I understand you correctly it will always evaluate either to zero or one for the whole statement -use-, not for each observation). The definition of _n that you quote never refers to the location of observations. "Number of current observation" where? In memory or in the file? For all other commands it is perfectly clear - in the memory. But for -use-, it just doesn't make sense to refer to memory since it just might be empty. Say, I am reading a file with use filename.dta if (age>18) & (int(_n/2)==_n/2) taking all even observations of persons older than 18. Then when Stata reads data it maintains two counters, one call it FC is the record number in the file, and another MC in the memory (and FC>=MC always). Evaluating _n to MC makes little sense since I may not a priori know how many results will the if-condition fetch. But evaluating _n to FC makes perfect sense because it defines which part of the file is eligible for load. Best regards, Sergiy Radyakin On Wed, Feb 4, 2009 at 8:59 PM, Joseph Coveney <jcoveney@bigplanet.com> wrote: > Sergiy, I believe that your colleague is correct in how Stata interprets the > underscore variable, _n. The help file states, " _n contains the number of > the current observation." And it also appears to qualifie -if- according to > the same criterion while -use- reads data in from a dataset file. If you're > loading a dataset from a disc file, _n is incremented as each observation's > record is read into memory. So, -if _n <= 37- will work, because _n will > increase from zero to 37 as further records are loaded and _n == 1, 2, 3, > etc. > tests True as being less than 37. But, starting from -clear- (with _n equal > to zero), -if _n > 37- will never be True, because, as each candidate > observation record is read into memory for testing of the condition, _n > would > only ever be equal to one, which is never greater than 37. And because the > condition tests as False at each test of -if _n > 37-, each successive > candidate record in the file on disc will be rejected--no observation > records > will ever be read into memory. > > The same holds for -if inrange(_n, 2, 20)-; starting with _n equal to zero > (empty in-memory dataset), _n will only be at most one as each successive > record is read and tested for the truth of -inrange(_n, 2, 20)-. _n will > never be between 2 and 20 and so each successive candidate record will be > rejected, leaving a dataset in memory of zero observations at the end. > > Joseph Coveney > > Sergiy Radyakin wrote: > >> in a different thread Dan Blanchette asked about cooperation of -in- >> and -if-. I have asked myself a slightly different question whether >> specifying if-conditions can always substitute for in-conditions: e.g. >> instead of "in #A/#B" one can type "if inrange(_n,#A,#B)". >> >> There seems to be a bug in -use- that get's confused by such a >> condition. My colleague has suggested that this might happen because >> Stata will qualify _n according to the current dataset in memory, but >> qualify if- for the dataset during the load. I was able to come up >> with an example where it get's confused unconditionally on the current >> dataset. It seems that the conditon "larger" is not evaluated properly >> in this case. >> >> *** bug with use ... if F(_n) >> *** N(auto.dta)=74 >> >> sysuse auto, clear >> local fullauto `r(fn)' >> >> use `"`fullauto'"' in 1/37, clear >> count >> assert (_N==37) >> >> use `"`fullauto'"' in 38/74, clear >> count >> assert (_N==37) >> >> use `"`fullauto'"' if _n<=37, clear >> count >> assert (_N==37) >> >> use `"`fullauto'"' if _n>37, clear >> count >> assert (_N==37) >> >> It is hard to understand what Stata will think of _n while loading >> data, but it is definitely not the observation number. >> Strangely the condition inrange(_n,1,20) loads 20 (twenty) >> observations, but inrange(_n,2,20) loads 0 (zero). >> >> So if you ever try to work with large datasets in smaller portions, >> slice them with an in-condition, not an if-condition! >> >> Stata MP for Windows, v10.1.551 born 02 Feb 2009, (currently latest. >> This recent update brings some very welcomed changes: thank you!) >> >> Best regards, Sergiy Radyakin >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: Bug in -use- or -if- ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

**References**:**st: Bug in -use- or -if- ?***From:*Sergiy Radyakin <serjradyakin@gmail.com>

**st: Re: Bug in -use- or -if- ?***From:*"Joseph Coveney" <jcoveney@bigplanet.com>

- Prev by Date:
**Re: st: chi2 test for trend with survey data** - Next by Date:
**Re: st: large numbers in comb(n,k) function: no success** - Previous by thread:
**st: Re: Bug in -use- or -if- ?** - Next by thread:
**Re: st: Re: Bug in -use- or -if- ?** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |