Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Partitioning a main file in several with a speficied number of variables


From   Nuno Soares <[email protected]>
To   [email protected]
Subject   st: Partitioning a main file in several with a speficied number of variables
Date   Sun, 7 Apr 2013 21:32:16 +0100

Hi everyone,

I'm having a problem with some files that are taking a long time for
Stata to process. These are based on the import of csv files and have
about 5000 variables (labelled as v1 - v5000). Don't worry, these are
not actual values, but only the way a given database provides data
which I than have to treat in Stata.

The treatment procedure is working fine, but Stata as some problems
dealing with such a large number of variables. I noticed that, if I
only have about 1000 variables, it takes Stata about one hour to
process each file. However, if the 5000 variables are used, it just
hangs up or takes almost 12 hours to do the same stuff.

So, to speed up the process, the solution is to brake the main files
into files with 1000 variables (or less). The problem is that I don't
know how to write a code in Stata that does this. If the files had
always the 5000 variables, I would just drop/keep the variables as:

preserve
keep v1 v2-v1000
save
restore

preserve
keep v1 v1001-v2000
save
restore

and so on (v1 must always be kept)

The problem is for those files that have more/less than 5000
variables, which I cannot know without opening each file.

Does anyone know a way to automate this?

Best wishes,

Nuno
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index