Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Partitioning a main file in several with a speficied number of variables

From	Nuno Soares <[email protected]>
To	[email protected]
Subject	st: Partitioning a main file in several with a speficied number of variables
Date	Sun, 7 Apr 2013 21:32:16 +0100

Hi everyone,

I'm having a problem with some files that are taking a long time for
Stata to process. These are based on the import of csv files and have
about 5000 variables (labelled as v1 - v5000). Don't worry, these are
not actual values, but only the way a given database provides data
which I than have to treat in Stata.

The treatment procedure is working fine, but Stata as some problems
dealing with such a large number of variables. I noticed that, if I
only have about 1000 variables, it takes Stata about one hour to
process each file. However, if the 5000 variables are used, it just
hangs up or takes almost 12 hours to do the same stuff.

So, to speed up the process, the solution is to brake the main files
into files with 1000 variables (or less). The problem is that I don't
know how to write a code in Stata that does this. If the files had
always the 5000 variables, I would just drop/keep the variables as:

preserve
keep v1 v2-v1000
save
restore

preserve
keep v1 v1001-v2000
save
restore

and so on (v1 must always be kept)

The problem is for those files that have more/less than 5000
variables, which I cannot know without opening each file.

Does anyone know a way to automate this?

Best wishes,

Nuno
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Partitioning a main file in several with a speficied number of variables
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: ML program in Stata 12.1 for a nonlinear equation of the form Y=F(x)*G(z)
Next by Date: st: qreg (median regression)
Previous by thread: st: SVY medians and Elixhauser
Next by thread: Re: st: Partitioning a main file in several with a speficied number of variables
Index(es):
- Date
- Thread