Nick Cox <njcoxstata@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: describe using (problem with abbrev )

Wed, 4 Dec 2013 21:53:19 +0000

Quite so. I didn't explain my point well enough, or indeed accurately enough. -describe using- as it now is doesn't include expansion of varlists; that is not to say that it could not be extended by StataCorp to do so. As a programmer you will be familiar with projects of your own that only got so far. I doubt very much that 1. Commands like -describe using- actually read in the data from the other dataset into memory, even temporarily. I suspect that purpose-built C code reads the file from afar. 2. Stata's commands based on C code need pay any attention to syntax in the sense of -syntax-, just as Mata pays no attention to that. but we are trading guesses. Nick njcoxstata@gmail.com On 4 December 2013 20:42, Sergiy Radyakin <serjradyakin@gmail.com> wrote: > Nick, > > the inner workings of Stata are not known, but what Alan is asking > about should be possible. We have a similar situation with the -use- > command, which supports subset of data: > > use mpg using auto.dta, clear > > Here the varlist (mpg) is the 'future' varlist, not the one in the > memory now, right? > > You may argue that what happens behind the scenes is that Stata: > 1) clears the memory > 2) loads full dataset > 3) unabbreviates the variable list > 4) drops the variables that were not mentioned. > > However Stata seems to unabbreviate the list of variables without > loading the whole dataset into memory: > > version 9.0 > clear > set mem 10m > > set obs 200000 > forval i=1/99 { > capture generate byte x`i'=`i' > } > > describe > tempfile t > save `t' > clear > set mem 3m > use x3-x5 using `t' > describe > > Obviously the test above makes sense in Stata before version 12.0, > which came out with an automatic memory manager. > The idea is that it can load x3,x4,x5 despite the full dataset does > not fit into memory, hence we should conclude that the header > information is processed separately, which is exactly what Alan is > asking about in his question. > > It seems that specifically for the -use- command the variables list is > treated specially, but although the same code is applicable to > -describe- it is simply not reused there. I am yet to see any command > that supplies a varlist (in the expected place after command name) > referring to the future state of the data and it is not a built-in > command (I am dying to see one). I would imagine that could also be > implemented with a few tricks with -anything- in the syntax. > > I would go with a two-step solution, firstly getting a full > description of the dataset, then filtering it for variables of > interest. Ideally StataCorp could have provided a possibility to delay > expansion of the varlist after parsing and an unab(s1,s2) string > function, where s1 is a string to be treated as abbreviated varlist, > and s2 is a string universe of variables. The result is a string of > full variable names from s2 that satisfy s1. This is of course even > currently possible to do yourself, but imho only if one dares to > rewrite the -syntax- command. > > Best, > Sergiy Radyakin > > > On Tue, Dec 3, 2013 at 7:50 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> Good catch by Daniel here. >> >> The reason that varlists with dashes are not allowed is presumably >> that Stata can't expand what it doesn't know about. That is, the >> dataset would have to be read in before Stata could expand a variable >> name range, and that's the point: the dataset is being accessed >> remotely. >> >> Nick >> njcoxstata@gmail.com >> >> >> On 3 December 2013 12:40, daniel klein <klein.daniel.81@gmail.com> wrote: >>> Alan, >>> >>> this behavior is documented in -help describe-. >>> >>> "The varlist in the describe using syntax differs from standard Stata >>> varlists in two ways. First, you cannot abbreviate variable names; >>> that is, you have to type displacement rather than displ. However, you >>> can use the wildcard character (~) to indicate abbreviations, for >>> example, displ~. Second, you may not refer to a range of variables; >>> specifying age-income is considered an error." >>> >>> Here is a sketch how you could allow the dash character >>> >>> *! version 1.0.0 03dec2013 Daniel Klein >>> >>> pr descdash >>> vers 11.2 >>> >>> syntax anything using [, * ] >>> >>> m : st_local("uservars", stritrim(st_local("anything"))) >>> loc uservars : subinstr loc uservars "- " "-" ,all >>> loc uservars : subinstr loc uservars " -" "-" ,all >>> >>> qui d `using' ,varl >>> loc allvars `r(varlist)' >>> >>> token `uservars' >>> forv j = 1/`: word count `uservars'' { >>> loc var : subinstr loc `j' "-" " " ,c(loc dsh) >>> if (`dsh') { >>> loc f : list posof "`: word 1 of `var''" in allvars >>> loc t : list posof "`: word 2 of `var''" in allvars >>> if (`t' < `f') { >>> di as err "variables out of order" >>> e 111 >>> } >>> m : st_local("var", /// >>> invtokens(tokens(st_local("allvars"))[(`f'..`t')])) >>> } >>> loc varlist `varlist' `var' >>> } >>> >>> d `varlist' `using' ,`options' >>> end >>> e >>> >>> descdash y1-y2 using ajit_112213 >>> >>> Best >>> Daniel >>> >>> -- >>> Hi _ In Stata 13 (and also in Stata 12), it appears that the >>> abbreviation with a dash "-" does not work with -describe using >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

