Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Stata - efficiently appending 200+ files (my method takes hours)


From   Robert Picard <picard@netbox.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Stata - efficiently appending 200+ files (my method takes hours)
Date   Sat, 21 Dec 2013 14:45:50 -0500

Take a look at -filelist- from SSC. It can create a Stata dataset of
files (with full path). The help file has an example that does what
you want efficiently. Here's a copy:

use "csv_datasets.dta", clear
local obs = _N
forvalues i=1/`obs' {
  use "csv_datasets.dta" in `i', clear
  local f = dirname + "/" + filename
  insheet using "`f'", clear
  tempfile save`i'
  save "`save`i''"
}

use "`save1'", clear
  forvalues i=2/`obs' {
  append using "`save`i''"
}


On Sat, Dec 21, 2013 at 1:55 PM, Sunita Surana <surana@gmail.com> wrote:
> I am trying to append approx. 200 files using Stata. Below I have
> provided the code I am using to append. The issue is that it is taking
> too long -- over 5 hours to do. The ultimate appended file has over 28
> million observations and is about 2GB in size. I think the issue might
> be that it is saving every time and hence takes too long. I also tried
> using the tempfile mode -- but that also takes long. My colleague, on
> the other hand, did the same append in minutes using SAS. I have
> provided his code below as well. I would very much appreciate if
> someone could show me how to do it efficiently in Stata -- so that it
> would not take hours. Thanks much!
>
> My Stata code:
>
> file close _all
>     file open myfile using "$OP\filelist_test.txt", read
>     file read myfile line
>
>     cd "$OP"
>     insheet using "`line'", comma clear
>     tostring optionconditioncode, replace
>
>     save "$data\options_all", replace
>
>     file read myfile line
>
>     while r(eof)==0{
>         insheet using "`line'", comma clear
>         tostring optionconditioncode, replace
>         append using "$data\options_all"
>         save "$data\options_all", replace
>
>         file read myfile line
>         }
>
>     file close myfile
>
> *******
>
> My colleague's SAS code:
>
> data all_text (drop=fname);
>       length myfilename $100;
>       set dirlist;
>       filepath = "&dirname\"||fname;
>       infile dummy filevar = filepath length=reclen end=done missover
> dlm=',' firstobs=2 dsd;
>       do while(not done);
>         myfilename = filepath;
>         input var1
>                     var2
>                     var3
>                     var4
>           output;
>       end;
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index