Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: recursively search folder sub directories and store filenames in a text file


From   Robert Picard <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: RE: recursively search folder sub directories and store filenames in a text file
Date   Thu, 31 Oct 2013 10:30:32 -0400

I noticed yesterday that there is already a user-written program
called -dirlist- available from SSC (it serves a different purpose) so
I apologize to the author for having used the same name for the
program I wrote. I cleaned my version and renamed it -filelist-. I'll
submit it to SSC later today.

In the mean time, here's how to do what Tim wanted, that is start from
a Stata dataset of file names (with path) and input a bunch of csv
files:

dirlist, fromdir(".") save("csvfiles.dta") ///
  pattern("*.csv") replace

use "csvfiles.dta", clear
local obs = _N
forvalues i=1/`obs' {
  use "csvfiles.dta" in `i', clear
  local f = fname
  insheet using "`f'", clear
  tempfile save`i'
  save "`save`i''"
}

clear
forvalues i=1/`obs' {
  append using "`save`i''"
}

On Thu, Oct 31, 2013 at 6:06 AM, Nick Cox <[email protected]> wrote:
> If I understand you correctly, you can use -file read- to read from a
> file regardless of what is in memory. Or use Mata (e.g. -cat()-) to
> read into a string matrix. Or read in the filenames first, put them
> somewhere else, e.g. a set of local macros, and then read in your
> dataset.
> Nick
> [email protected]
>
>
> On 31 October 2013 10:00, Tim Evans <[email protected]> wrote:
>> What I meant to finish off below was that I couldn't resolve the fact that I needed to load the data file created in the first part of the routine to access the data (filenames) for which I wanted to combine into one dataset, while simultaneously needing an empty datasheet in order to use -insheet- to read in each file of interest and append into one large dataset - I couldn't work this out, so stopped.
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Tim Evans
>> Sent: 31 October 2013 09:46
>> To: [email protected]
>> Subject: RE: st: RE: recursively search folder sub directories and store filenames in a text file
>>
>> Robert,
>>
>> Thanks for all of your help. I eventually went down the route of saving the results in a log-file and then reading in the files and the code I used is below. I did try to take advantage of the datafile you helped create in your second suggestion, but I couldn't overcome the fact that I loaded the file to access the values (filenames) but at the same time having an empty dataset.
>>
>> --BEGIN CODE--
>>
>> clear all
>>
>> cd "T:\Final"
>>
>> cap program drop dirlist
>> program define dirlist
>>
>>    syntax, fromdir(string)
>>
>>    // list of all files in "`fromdir'"
>>    local flist: dir "`fromdir'" files "*.csv"
>>    foreach f of local flist {
>>       dis "`fromdir'/`f'"
>>    }
>>
>>    // recursively list directories in "`fromdir'"
>>    local dlist: dir "`fromdir'" dirs "*"
>>    foreach d of local dlist {
>>       dirlist , fromdir("`fromdir'/`d'") `list'
>>    }
>>
>> end
>>
>> log using filenames.log, replace
>>
>> local cdir = "`c(pwd)'"
>> dirlist, fromdir("`cdir'")
>>
>> log close
>>
>> insheet using filenames.log
>> keep if  regexm(v1, "^T") == 1  ///Clean log file of any rows not associated with a filename and path rename v1 filename
>>
>> outsheet using "T:\Final\final_txt.txt", nonames replace
>>
>> clear all
>>
>> file open myfile using "T:\Final\final_txt.txt", read file read myfile line
>>
>> insheet using `line', comma names
>> di as text `line'
>> save master_data, replace
>> clear
>> file read myfile line
>> while r(eof)==0 {
>>         insheet using `line'
>>         di as text `line'
>>         save temp, replace
>>         append using master_data, force
>>         save master_data, replace
>>         **save temp, replace
>>         clear
>>                 file read myfile line
>> }
>> append using master_data
>>
>> outsheet using "T:\Final\combined_data.csv", comma names replace
>>
>> --END CODE--
>>
>> Best wishes
>>
>> Tim
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Robert Picard
>> Sent: 30 October 2013 15:18
>> To: [email protected]
>> Subject: Re: st: RE: recursively search folder sub directories and store filenames in a text file
>>
>> If this is a one shot deal, I would have simply copied the output from the results window to a text file and processed the list from there.
>> Using a log file to capture the list is also simple. It does make sense however that a program that recursively lists files save the list to a dataset so here's a modified version that adds that capability. While I was at it, I added a -pattern()- option if you want to restrict the search.
>>
>> Robert
>>
>> * ----- begin example --------------------------- cap program drop dirlist program define dirlist
>>
>>   syntax , fromdir(string) save(string) ///
>>     [pattern(string) replace append]
>>
>>   // get files in "`fromdir'" using pattern
>>   if "`pattern'" == "" local pattern "*"
>>   local flist: dir "`fromdir'" files "`pattern'"
>>
>>   qui {
>>
>>     // initialize dataset to use
>>     if "`append'" != "" use "`save'", clear
>>     else {
>>       clear
>>       gen fname = ""
>>     }
>>
>>     // add files to the dataset
>>     local i = _N
>>     foreach f of local flist {
>>       set obs `++i'
>>       replace fname = "`fromdir'/`f'" in `i'
>>     }
>>     save "`save'", `replace'
>>
>>   }
>>
>>   // recursively list directories in "`fromdir'"
>>   local dlist: dir "`fromdir'" dirs "*"
>>   foreach d of local dlist {
>>     dirlist , fromdir("`fromdir'/`d'") save(`save') ///
>>     pattern("`pattern'") append replace
>>   }
>>
>> end
>>
>> * start from the current directory
>> local cdir = "`c(pwd)'"
>>
>> * list all files
>> dirlist, fromdir("`cdir'") save("allfiles.dta") replace
>>
>> * list all Excel files
>> dirlist, fromdir("`cdir'") save("dofiles.dta") ///
>>   pattern("*.xls") replace
>>
>> * ----- end example -----------------------------
>>
>> On Wed, Oct 30, 2013 at 6:16 AM, Tim Evans <[email protected]> wrote:
>>> Robert,
>>>
>>> Thank you very much, this does indeed seem to do the trick - I am impressed! What I would like to do is save the files I list into either a .dta file, or to a text file which I can then read into Stata. The aim then will be to run through each record and open the file.
>>>
>>> My only suggestion I have at the moment would be to open a log file and save this, although this might not be the best way of doing things. Do you have any advice?
>>>
>>> Bes wishes
>>>
>>> Tim
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Robert
>>> Picard
>>> Sent: 29 October 2013 13:45
>>> To: [email protected]
>>> Subject: Re: st: RE: recursively search folder sub directories and
>>> store filenames in a text file
>>>
>>> Here is a way, from an initial directory, to recursively list all files in Stata.
>>>
>>> Robert
>>>
>>> * ----- begin example --------------------------- cap program drop
>>> dirlist program define dirlist
>>>
>>>    syntax , fromdir(string)
>>>
>>>    // list of all files in "`fromdir'"
>>>    local flist: dir "`fromdir'" files "*"
>>>    foreach f of local flist {
>>>       dis "`fromdir'/`f'"
>>>    }
>>>
>>>    // recursively list directories in "`fromdir'"
>>>    local dlist: dir "`fromdir'" dirs "*"
>>>    foreach d of local dlist {
>>>       dirlist , fromdir("`fromdir'/`d'") `list'
>>>    }
>>>
>>> end
>>>
>>> local cdir = "`c(pwd)'"
>>> dirlist, fromdir("`cdir'")
>>>
>>> * ----- end example -----------------------------
>>>
>>> On Tue, Oct 29, 2013 at 8:04 AM, Tim Evans <[email protected]> wrote:
>>>> Hi all,
>>>>
>>>> I am using Stata 11.2 and have a working directory called "T:\Projects\Final". In this folder I have a number of subfolders i.e. GEH_2013, SWB_2013 and within these I have for example GEH_COL and GEH_OGD. Within these folders I have a csv file.
>>>>
>>>> So folder structure looks like :
>>>>
>>>> T:\Projects\Final
>>>> T:\Projects\Final\GEH_2013
>>>> T:\Projects\Final\GEH_2013\GEH_COL
>>>> T:\Projects\Final\GEH_2013\GEH_COL\ GEH_COL_combined.csv
>>>> T:\Projects\Final\GEH_2013\GEH_OGD
>>>> T:\Projects\Final\GEH_2013\GEH_OGD\ GEH_OGD_combined.csv
>>>> T:\Projects\Final\SWB_2013
>>>> T:\Projects\Final\SWB_2013\SWB_COL
>>>> T:\Projects\Final\SWB_2013\SWB_COL\SWB_COL_combined.csv
>>>> T:\Projects\Final\SWB_2013\SWB_OGD
>>>> T:\Projects\Final\SWB_2013\SWB_OGD\SWB_OGD_combined.csv
>>>>
>>>>
>>>> What I am trying to do is ultimately identify the names of each csv file contained at the third level of sub-directory and append the csv files into one large file.
>>>>
>>>> I have taken a look at using the following:
>>>>
>>>> rcd, :! dir *.csv /a-d /b >filelist.txt
>>>>
>>>> but all this does is create a text file in each sub-directory with the name of the csv file in that directory - so for T:\Projects\Final I have an empty text file as no csv files here, but what I need is a single text file that contains the filename and path for each csv file contained within T:\Projects\Final.
>>>>
>>>> Once I have this, my aim is to use the filenames and paths stored in the text file and to combine each csv file into one file.
>>>>
>>>> If anyone has a more elegant method of appending all csv files that are stored within sub-directories of a folder then I'd be grateful to hear!
>>>>
>>>> Best wishes
>>>>
>>>> Tim
>>>>
>>>> *********************************************************************
>>>> *
>>>> **** The information contained in the EMail and any attachments is
>>>> confidential and intended solely and for the attention and use of the
>>>> named addressee(s). It may not be disclosed to any other person
>>>> without the express authority of Public Health England, or the
>>>> intended recipient, or both. If you are not the intended recipient,
>>>> you must not disclose, copy, distribute or retain this message or any
>>>> part of it. This footnote also confirms that this EMail has been
>>>> swept for computer viruses by Symantec.Cloud, but please re-sweep any
>>>> attachments before opening or saving. http://www.gov.uk/PHE
>>>> *********************************************************************
>>>> *
>>>> ****
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> **********************************************************************
>>> **** The information contained in the EMail and any attachments is
>>> confidential and intended solely and for the attention and use of the
>>> named addressee(s). It may not be disclosed to any other person
>>> without the express authority of Public Health England, or the
>>> intended recipient, or both. If you are not the intended recipient,
>>> you must not disclose, copy, distribute or retain this message or any
>>> part of it. This footnote also confirms that this EMail has been swept
>>> for computer viruses by Symantec.Cloud, but please re-sweep any
>>> attachments before opening or saving. http://www.gov.uk/PHE
>>> **********************************************************************
>>> ****
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> **************************************************************************
>> The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
>> **************************************************************************
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> **************************************************************************
>> The information contained in the EMail and any attachments is confidential and intended solely and for the attention and use of the named addressee(s). It may not be disclosed to any other person without the express authority of Public Health England, or the intended recipient, or both. If you are not the intended recipient, you must not disclose, copy, distribute or retain this message or any part of it. This footnote also confirms that this EMail has been swept for computer viruses by Symantec.Cloud, but please re-sweep any attachments before opening or saving. http://www.gov.uk/PHE
>> **************************************************************************
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index