Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: describe using (problem with abbrev )

From	Sergiy Radyakin <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: describe using (problem with abbrev )
Date	Wed, 4 Dec 2013 15:42:22 -0500

Nick,

the inner workings of Stata are not known, but what Alan is asking
about should be possible. We have a similar situation with the -use-
command, which supports subset of data:

use mpg using auto.dta, clear

Here the varlist (mpg) is the 'future' varlist, not the one in the
memory now, right?

You may argue that what happens behind the scenes is that Stata:
1) clears the memory
2) loads full dataset
3) unabbreviates the variable list
4) drops the variables that were not mentioned.

However Stata seems to unabbreviate the list of variables without
loading the whole dataset into memory:

version 9.0
clear
set mem 10m

set obs 200000
forval i=1/99 {
  capture generate byte x`i'=`i'
}

describe
tempfile t
save `t'
clear
set mem 3m
use x3-x5 using `t'
describe

Obviously the test above makes sense in Stata before version 12.0,
which came out with an automatic memory manager.
The idea is that it can load x3,x4,x5 despite the full dataset does
not fit into memory, hence we should conclude that the header
information is processed separately, which is exactly what Alan is
asking about in his question.

It seems that specifically for the -use- command the variables list is
treated specially, but although the same code is applicable to
-describe- it is simply not reused there. I am yet to see any command
that supplies a varlist (in the expected place after command name)
referring to the future state of the data and it is not a built-in
command (I am dying to see one). I would imagine that could also be
implemented with a few tricks with -anything- in the syntax.

I would go with a two-step solution, firstly getting a full
description of the dataset, then filtering it for variables of
interest. Ideally StataCorp could have provided a possibility to delay
expansion of the varlist after parsing and an unab(s1,s2) string
function, where s1 is a string to be treated as abbreviated varlist,
and s2 is a string universe of variables. The result is a string of
full variable names from s2 that satisfy s1. This is of course even
currently possible to do yourself, but imho only if one dares to
rewrite the -syntax- command.

Best,
  Sergiy Radyakin

On Tue, Dec 3, 2013 at 7:50 AM, Nick Cox <[email protected]> wrote:
> Good catch by Daniel here.
>
> The reason that varlists with dashes are not allowed is presumably
> that Stata can't expand what it doesn't know about. That is, the
> dataset would have to be read in before Stata could expand a variable
> name range, and that's the point: the dataset is being accessed
> remotely.
>
> Nick
> [email protected]
>
>
> On 3 December 2013 12:40, daniel klein <[email protected]> wrote:
>> Alan,
>>
>> this behavior is documented in -help describe-.
>>
>> "The varlist in the describe using syntax differs from standard Stata
>> varlists in two ways. First, you cannot abbreviate variable names;
>> that is, you have to type displacement rather than displ. However, you
>> can use the wildcard character (~) to indicate abbreviations, for
>> example, displ~. Second, you may not refer to a range of variables;
>> specifying age-income is considered an error."
>>
>> Here is a sketch how you could allow the dash character
>>
>> *! version 1.0.0 03dec2013 Daniel Klein
>>
>> pr descdash
>>  vers 11.2
>>
>>  syntax anything using [, * ]
>>
>>  m : st_local("uservars", stritrim(st_local("anything")))
>>  loc uservars : subinstr loc uservars "- " "-" ,all
>>  loc uservars : subinstr loc uservars " -" "-" ,all
>>
>>  qui d `using' ,varl
>>  loc allvars `r(varlist)'
>>
>>  token `uservars'
>>  forv j = 1/`: word count `uservars'' {
>>   loc var : subinstr loc `j' "-" " " ,c(loc dsh)
>>   if (`dsh') {
>>    loc f : list posof "`: word 1 of `var''" in allvars
>>    loc t : list posof "`: word 2 of `var''" in allvars
>>    if (`t' < `f') {
>>     di as err "variables out of order"
>>     e 111
>>    }
>>    m : st_local("var", ///
>>    invtokens(tokens(st_local("allvars"))[(`f'..`t')]))
>>   }
>>   loc varlist `varlist' `var'
>>  }
>>
>>  d `varlist' `using' ,`options'
>> end
>> e
>>
>> descdash y1-y2 using ajit_112213
>>
>> Best
>> Daniel
>>
>> --
>> Hi _ In Stata 13 (and also in Stata 12), it appears that the
>> abbreviation with a dash "-" does not work with -describe using
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: describe using (problem with abbrev )
  - From: Nick Cox <[email protected]>

References:
- Re: st: describe using (problem with abbrev )
  - From: daniel klein <[email protected]>
- Re: st: describe using (problem with abbrev )
  - From: Nick Cox <[email protected]>

Prev by Date: st: question about confirmatory factor analysis using the sem command in Stata
Next by Date: Re: st: constant in -xtreg (yes, again!)
Previous by thread: Re: st: describe using (problem with abbrev )
Next by thread: Re: st: describe using (problem with abbrev )
Index(es):
- Date
- Thread