Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Stata - efficiently appending 200+ files (my method takes hours)
From
Robert Picard <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Stata - efficiently appending 200+ files (my method takes hours)
Date
Sat, 21 Dec 2013 14:45:50 -0500
Take a look at -filelist- from SSC. It can create a Stata dataset of
files (with full path). The help file has an example that does what
you want efficiently. Here's a copy:
use "csv_datasets.dta", clear
local obs = _N
forvalues i=1/`obs' {
use "csv_datasets.dta" in `i', clear
local f = dirname + "/" + filename
insheet using "`f'", clear
tempfile save`i'
save "`save`i''"
}
use "`save1'", clear
forvalues i=2/`obs' {
append using "`save`i''"
}
On Sat, Dec 21, 2013 at 1:55 PM, Sunita Surana <[email protected]> wrote:
> I am trying to append approx. 200 files using Stata. Below I have
> provided the code I am using to append. The issue is that it is taking
> too long -- over 5 hours to do. The ultimate appended file has over 28
> million observations and is about 2GB in size. I think the issue might
> be that it is saving every time and hence takes too long. I also tried
> using the tempfile mode -- but that also takes long. My colleague, on
> the other hand, did the same append in minutes using SAS. I have
> provided his code below as well. I would very much appreciate if
> someone could show me how to do it efficiently in Stata -- so that it
> would not take hours. Thanks much!
>
> My Stata code:
>
> file close _all
> file open myfile using "$OP\filelist_test.txt", read
> file read myfile line
>
> cd "$OP"
> insheet using "`line'", comma clear
> tostring optionconditioncode, replace
>
> save "$data\options_all", replace
>
> file read myfile line
>
> while r(eof)==0{
> insheet using "`line'", comma clear
> tostring optionconditioncode, replace
> append using "$data\options_all"
> save "$data\options_all", replace
>
> file read myfile line
> }
>
> file close myfile
>
> *******
>
> My colleague's SAS code:
>
> data all_text (drop=fname);
> length myfilename $100;
> set dirlist;
> filepath = "&dirname\"||fname;
> infile dummy filevar = filepath length=reclen end=done missover
> dlm=',' firstobs=2 dsd;
> do while(not done);
> myfilename = filepath;
> input var1
> var2
> var3
> var4
> output;
> end;
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/