Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Stata - efficiently appending 200+ files (my method takes hours)


From   Sunita Surana <surana@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Stata - efficiently appending 200+ files (my method takes hours)
Date   Sat, 21 Dec 2013 16:07:46 -0500

Thanks a lot!  This worked beautifully.  Done in under 10 mins.

On Sat, Dec 21, 2013 at 2:45 PM, Robert Picard <picard@netbox.com> wrote:
> Take a look at -filelist- from SSC. It can create a Stata dataset of
> files (with full path). The help file has an example that does what
> you want efficiently. Here's a copy:
>
> use "csv_datasets.dta", clear
> local obs = _N
> forvalues i=1/`obs' {
>   use "csv_datasets.dta" in `i', clear
>   local f = dirname + "/" + filename
>   insheet using "`f'", clear
>   tempfile save`i'
>   save "`save`i''"
> }
>
> use "`save1'", clear
>   forvalues i=2/`obs' {
>   append using "`save`i''"
> }
>
>
> On Sat, Dec 21, 2013 at 1:55 PM, Sunita Surana <surana@gmail.com> wrote:
>> I am trying to append approx. 200 files using Stata. Below I have
>> provided the code I am using to append. The issue is that it is taking
>> too long -- over 5 hours to do. The ultimate appended file has over 28
>> million observations and is about 2GB in size. I think the issue might
>> be that it is saving every time and hence takes too long. I also tried
>> using the tempfile mode -- but that also takes long. My colleague, on
>> the other hand, did the same append in minutes using SAS. I have
>> provided his code below as well. I would very much appreciate if
>> someone could show me how to do it efficiently in Stata -- so that it
>> would not take hours. Thanks much!
>>
>> My Stata code:
>>
>> file close _all
>>     file open myfile using "$OP\filelist_test.txt", read
>>     file read myfile line
>>
>>     cd "$OP"
>>     insheet using "`line'", comma clear
>>     tostring optionconditioncode, replace
>>
>>     save "$data\options_all", replace
>>
>>     file read myfile line
>>
>>     while r(eof)==0{
>>         insheet using "`line'", comma clear
>>         tostring optionconditioncode, replace
>>         append using "$data\options_all"
>>         save "$data\options_all", replace
>>
>>         file read myfile line
>>         }
>>
>>     file close myfile
>>
>> *******
>>
>> My colleague's SAS code:
>>
>> data all_text (drop=fname);
>>       length myfilename $100;
>>       set dirlist;
>>       filepath = "&dirname\"||fname;
>>       infile dummy filevar = filepath length=reclen end=done missover
>> dlm=',' firstobs=2 dsd;
>>       do while(not done);
>>         myfilename = filepath;
>>         input var1
>>                     var2
>>                     var3
>>                     var4
>>           output;
>>       end;
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index