Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Pulling in files and data stored in a folder tree
From 
 
"Ben Hoen" <[email protected]> 
To 
 
<[email protected]> 
Subject 
 
RE: st: Pulling in files and data stored in a folder tree 
Date 
 
Mon, 30 Jul 2012 15:26:35 -0400 
I meant Dr. Lacy.  Sorry about that Mike.
Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Lacy,Michael
Sent: Saturday, July 28, 2012 9:45 AM
To: [email protected]
Subject: Re: st: Pulling in files and data stored in a folder tree
"Ben Hoen" <[email protected]> wrote:
>Date: Fri, 27 Jul 2012 11:26:31 -0400
>Subject: st: Pulling in files and data stored in a folder tree
>
>Hi Statalisters,
>
>I have a set of ~ 200,000 records stored in one dataset ("master file")
each
>of which has a year and a county to which it applies, and a unique record
>id.  Separately I have a large set of files that are stored by county (of
>which there are 20, so there are 20 county folders) and year (for each
>county there are 10 year folders - 2002 through 2011).  In each year
folder,
>there are 4 files that I want to pull data from (via 1:1 merge with the
>"master file" using the record id).  There are roughly 10 variables I want
>to add to the master file from these 4 files, or approximately 2 to 3 from
>each file.
>
>So, the question is how I might write code that will go through each record
>in the master file, determine the year and the county, go through the
folder
>tree to find the appropriate year in the appropriate county, and then merge
>with the four files "keeping" the data from the 10 variables?
>
>A few things to note:  1) the files I want to pull data from are column
>separated text files (i.e., I have not gone through the trouble of
>converting then to Stata files yet - but could.); and, 2) all of the files
>from which I want to pull data are named by county and year (e.g.,
><countyname>_<year>_<filename>) and these names match exactly with the
>county names and years stored in the master file. 
>
Yes, you need to convert them first to Stata files.
I'd think about applying -levelsof- to your master file to get 
the names of each of your county/year combination,  and use 
that to get into folder containing each  that you need
to -insheet- into a Stata file.  I'd put each of these into
a numbered list of tempfiles, and then merge each
one onto your master.
Something like this is what I was thinking of :
use master
levelsof county,local(counties)
levelsof year, local(years)
clear
cd "directory holding all the county-year files"
local basedirectory = "whatever"
local filecount = 0
// Put all the using files into Stata format,
// and save them in numbered temp files
foreach c of local counties {
   foreach y of local years {
     cd "`basedirectory\`c'\`y'"  // whatever fits your file system
     insheet using "first file of 4" .....
     local filecount = `filecount' + 1
     keep ...list of the variables of interest from file 1
     tempfile temp`filecount'
     save `temp`filecount''
     ....
     ....
     local filecount = `filecount' + 1
     insheet using "last file of 4" .....
     local filecount = `filecount' + 1
     keep ...list of the variables of interest from file 4
     tempfile temp`filecount'
     save `temp`filecount''
   }
}     
//
forval i = 1/`filecount' {
   merge 1:1 county year using "`temp`i''"
   tab1 _merge
   keep if (_merge != 2) 
	drop _merge		         
}
     
Regards,
Mike Lacy
Dept. of Sociology
Colorado State Universty
Fort Collins CO  U.S
Mike Lacy
Assoc. Prof./Dir. Grad. Studies
Dept. of Sociology
Colorado State University
Fort Collins CO 80523-1784
970.491.6721 (voice)
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/