[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: sub-dataset by variable numbers

From   "Jian'an Luan" <>
To   <>, "n j cox" <>
Subject   RE: st: RE: sub-dataset by variable numbers
Date   Wed, 3 Oct 2007 12:14:36 +0100

Thanks Nick for your valuable comments.  I work on a dataset with over
20,000 SNPs (variables) plus few phenotype variables.  It is computation
expensive to works directly on the big dataset, SNP by SNP (very slow).
Within a program in a loop, if I save certain number of SNPs, 100 for
example, into a tempfile in memory and work on the tempfile each time,
it is about 10-20 times faster. To automatically get the tempfile
contains a fixed number of SNPs in a loop, one after another, using
variable number is the effective way.  With "unab" command, I do not
need to know the SNP names (variables names) during the whole analysis,
while the results (including SNP name) for all SNPs can be saved in one
file using postfile.

Thank you all for your help


> -----Original Message-----
> From: 
> [] On Behalf Of n j cox
> Sent: 02 October 2007 19:22
> To:
> Subject: Re: st: RE: sub-dataset by variable numbers
> Sebastian showed a good way to do this, but I agree with Seyi.
> It is to me easier (and perhaps less error-prone) to select 
> by names, than to map variable names to numbers and work with 
> those. I'd be interested to hear the context in which working 
> with numbers appears more natural (or more convenient) than 
> working with names.
> Thanks to Seyi for the plug for -renames-. -renames- remains 
> on SSC because it might be useful to any people still on 
> Stata 6 [sic]. Those using Stata 7 will be better off using 
> -renvars- from STB-60 and those using Stata 8 up will be 
> better off using -renvars- from SJ 5(4). I'll ask Kit Baum to 
> amend the package description so that this is clearer.
> Nick
> Jian'an Luan asked
> I wonder if I can select a sub-dataset by variables without 
> given variable names but using variable numbers, something 
> like the command when select a sub-dataset by observations 
> ".keep if _n>1001 & _n<2000".
> Sebastian Buechte replied
> I do not think there is a direct command which will help you 
> to achieve what you would like. But, you could try the following:
> unab vlist : _all
> //keep 3rd to 6th variable from dataset in memory keep `: 
> word 3 of `vlist'' - `: word 6 of `vlist''
> This will read all variable names into a local macro (vlist) 
> and then uses local extended functions to extract numbered 
> elements from this macro. For more information look at: -help 
> unab- and -help
> extended_fcn-
> Seyi Soremekun replied
> I don't think Stata will allow variable names starting with numbers.
> But if you want your list of variables in some kind of 
> number-related format, maybe you could rename all your 
> variables as a series such as A1, A2...etc...
> You can do this en masse using Nick Cox's 'renames' programme code:
> e.g renames price-age \ a1-a20 (if you have 20 different 
> variables with the first being 'price' and the last being 
> 'age') then you can use 'drop' or 'keep' to remove part of the dataset
> drop a1-a12
> But I'm sure you can do this anyway without having to rename 
> your variables (e.g. drop price-year), so you only need to 
> use the -renames- command if you really want to convert your 
> variable names into some kind of numerical-like list.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index