[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Jian'an Luan" <jal42@medschl.cam.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu>, "n j cox" <n.j.cox@durham.ac.uk> |

Subject |
RE: st: RE: sub-dataset by variable numbers |

Date |
Wed, 3 Oct 2007 12:14:36 +0100 |

Thanks Nick for your valuable comments. I work on a dataset with over 20,000 SNPs (variables) plus few phenotype variables. It is computation expensive to works directly on the big dataset, SNP by SNP (very slow). Within a program in a loop, if I save certain number of SNPs, 100 for example, into a tempfile in memory and work on the tempfile each time, it is about 10-20 times faster. To automatically get the tempfile contains a fixed number of SNPs in a loop, one after another, using variable number is the effective way. With "unab" command, I do not need to know the SNP names (variables names) during the whole analysis, while the results (including SNP name) for all SNPs can be saved in one file using postfile. Thank you all for your help Jianan > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of n j cox > Sent: 02 October 2007 19:22 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: RE: sub-dataset by variable numbers > > Sebastian showed a good way to do this, but I agree with Seyi. > It is to me easier (and perhaps less error-prone) to select > by names, than to map variable names to numbers and work with > those. I'd be interested to hear the context in which working > with numbers appears more natural (or more convenient) than > working with names. > > Thanks to Seyi for the plug for -renames-. -renames- remains > on SSC because it might be useful to any people still on > Stata 6 [sic]. Those using Stata 7 will be better off using > -renvars- from STB-60 and those using Stata 8 up will be > better off using -renvars- from SJ 5(4). I'll ask Kit Baum to > amend the package description so that this is clearer. > > Nick > n.j.cox@durham.ac.uk > > > Jian'an Luan asked > > I wonder if I can select a sub-dataset by variables without > given variable names but using variable numbers, something > like the command when select a sub-dataset by observations > ".keep if _n>1001 & _n<2000". > > Sebastian Buechte replied > > I do not think there is a direct command which will help you > to achieve what you would like. But, you could try the following: > > unab vlist : _all > //keep 3rd to 6th variable from dataset in memory keep `: > word 3 of `vlist'' - `: word 6 of `vlist'' > > This will read all variable names into a local macro (vlist) > and then uses local extended functions to extract numbered > elements from this macro. For more information look at: -help > unab- and -help > extended_fcn- > > Seyi Soremekun replied > > I don't think Stata will allow variable names starting with numbers. > > But if you want your list of variables in some kind of > number-related format, maybe you could rename all your > variables as a series such as A1, A2...etc... > > You can do this en masse using Nick Cox's 'renames' programme code: > http://ideas.repec.org/c/boc/bocode/s388102.html > e.g renames price-age \ a1-a20 (if you have 20 different > variables with the first being 'price' and the last being > 'age') then you can use 'drop' or 'keep' to remove part of the dataset > > drop a1-a12 > > But I'm sure you can do this anyway without having to rename > your variables (e.g. drop price-year), so you only need to > use the -renames- command if you really want to convert your > variable names into some kind of numerical-like list. > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: RE: sub-dataset by variable numbers***From:*n j cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Test for the difference in the coefficients of two models that usedifferent samples - Stochastic Frontier Analysis** - Next by Date:
**st: Rasch** - Previous by thread:
**Re: st: RE: sub-dataset by variable numbers** - Next by thread:
**Re: st: How to calculate 75 percentile of other individuals on thesame** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |