Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Stata - efficiently appending 200+ files (my method takes hours)
From
Sunita Surana <[email protected]>
To
[email protected]
Subject
Re: st: Stata - efficiently appending 200+ files (my method takes hours)
Date
Sat, 21 Dec 2013 16:07:46 -0500
Thanks a lot! This worked beautifully. Done in under 10 mins.
On Sat, Dec 21, 2013 at 2:45 PM, Robert Picard <[email protected]> wrote:
> Take a look at -filelist- from SSC. It can create a Stata dataset of
> files (with full path). The help file has an example that does what
> you want efficiently. Here's a copy:
>
> use "csv_datasets.dta", clear
> local obs = _N
> forvalues i=1/`obs' {
> use "csv_datasets.dta" in `i', clear
> local f = dirname + "/" + filename
> insheet using "`f'", clear
> tempfile save`i'
> save "`save`i''"
> }
>
> use "`save1'", clear
> forvalues i=2/`obs' {
> append using "`save`i''"
> }
>
>
> On Sat, Dec 21, 2013 at 1:55 PM, Sunita Surana <[email protected]> wrote:
>> I am trying to append approx. 200 files using Stata. Below I have
>> provided the code I am using to append. The issue is that it is taking
>> too long -- over 5 hours to do. The ultimate appended file has over 28
>> million observations and is about 2GB in size. I think the issue might
>> be that it is saving every time and hence takes too long. I also tried
>> using the tempfile mode -- but that also takes long. My colleague, on
>> the other hand, did the same append in minutes using SAS. I have
>> provided his code below as well. I would very much appreciate if
>> someone could show me how to do it efficiently in Stata -- so that it
>> would not take hours. Thanks much!
>>
>> My Stata code:
>>
>> file close _all
>> file open myfile using "$OP\filelist_test.txt", read
>> file read myfile line
>>
>> cd "$OP"
>> insheet using "`line'", comma clear
>> tostring optionconditioncode, replace
>>
>> save "$data\options_all", replace
>>
>> file read myfile line
>>
>> while r(eof)==0{
>> insheet using "`line'", comma clear
>> tostring optionconditioncode, replace
>> append using "$data\options_all"
>> save "$data\options_all", replace
>>
>> file read myfile line
>> }
>>
>> file close myfile
>>
>> *******
>>
>> My colleague's SAS code:
>>
>> data all_text (drop=fname);
>> length myfilename $100;
>> set dirlist;
>> filepath = "&dirname\"||fname;
>> infile dummy filevar = filepath length=reclen end=done missover
>> dlm=',' firstobs=2 dsd;
>> do while(not done);
>> myfilename = filepath;
>> input var1
>> var2
>> var3
>> var4
>> output;
>> end;
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/