Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: repeat same commands over hundreds of files


From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: repeat same commands over hundreds of files
Date   Tue, 2 Nov 2010 20:31:30 +0000

<>

One other note:  if your files are sequentially numbered but there are gaps (as there are in my example of filenames), you might want to put in a -confirm- statement to capture whether the file exists and skip it if it doesn't exist.  So, modifying my prev. example, you'd want something like this:

*********!
forval n = 1972/1981 {

cap confirm file  "/Users/tbrunell/MPG/CT/mpg_09_CTC`n'_`n'_EDCD11_10_JH22.csv"
     if !_rc {

clear
insheet using "/Users/tbrunell/MPG/CT/mpg_09_CTC`n'_`n'_EDCD11_10_JH22.csv"
drop in L /*this drops file notation at the bottom*/
compress
gen demper=dem/(dem+rep)
gen demwin=.
replace demwin=1 if demper>.5 & demper~=.
replace demwin=0 if demper<.5
sort rkey
gen overalldemper=overalldem/(overalldem+overallrep)
collapse (count) numberofseats=demper (sum) demwin (mean) year demper overalldemper (p50) median=demper,by(rkey)
gen percentdemdist=demwin/numberofseats


**create a macro for the decade**
local save
if inrange(`n', 1970, 1979) local save 1970
if inrange(`n', 1980, 1989) local save 1980 


save "/Users/tbrunell//MPG/CT/CTC`save's", replace

}

 else {
di "file for `n' doesnt exist!"
                }
}
************!

- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]

On Nov 2, 2010, at 3:22 PM, Eric Booth wrote:

> <>
> 
> Hi Tom:
> 
> The best approach probably depends on how your file names are sequenced and how your folders/files are organized, but programs like -fs- (from SSC) and others are useful for this type of work.  Here's two approaches:
> 
> 
> assuming you've got files named sequentially like this:
> 
> mpg_09_CTC1972_1972_EDCD11_10_JH22
> mpg_09_CTC1973_1973_EDCD11_10_JH22
> mpg_09_CTC1974_1974_EDCD11_10_JH22
> mpg_09_CTC1975_1975_EDCD11_10_JH22
> mpg_09_CTC1981_1981_EDCD11_10_JH22
> mpg_09_CTC1982_1982_EDCD11_10_JH22
> 
> 
> 
> You could use a -forvalues- loop like:
> 
> *********!
> forval n = 1972/1981 {

cap confirm file  "/Users/tbrunell/MPG/CT/mpg_09_CTC`n'_`n'_EDCD11_10_JH22.csv"
     if !_rc {
> clear
> insheet using "/Users/tbrunell/MPG/CT/mpg_09_CTC`n'_`n'_EDCD11_10_JH22.csv"
> drop in L /*this drops file notation at the bottom*/
> compress
> gen demper=dem/(dem+rep)
> gen demwin=.
> replace demwin=1 if demper>.5 & demper~=.
> replace demwin=0 if demper<.5
> sort rkey
> gen overalldemper=overalldem/(overalldem+overallrep)
> collapse (count) numberofseats=demper (sum) demwin (mean) year demper overalldemper (p50) median=demper,by(rkey)
> gen percentdemdist=demwin/numberofseats
> 
> 
> **create a macro for the decade**
> local save
> if inrange(`n', 1970, 1979) local save 1970
> if inrange(`n', 1980, 1989) local save 1980 
> 
> 
> save "/Users/tbrunell//MPG/CT/CTC`save's", replace
> 
}

 else {
di "file for `n' doesnt exist!"
                }
}
> ************!
> 
> Note the use of the local macros to create the decade for the -save- filename.
> 
> 
> 
> Another approach is to just find all the .csv files in your folder (or alternatively this could be done to find all the folders of interest and all the .csv files in all the folders of interest) using the macro extended functions (see -help extended_fcn-)  and run the code on all of them , e.g., 
> 
> *************!
> global files:dir "<folder path>" files "*.csv", respectcase
> token `"$files"'
> di in yellow `"$files"'
> 
> while "`1'" != "" {
> 	clear
> 	insheet using "/Users/tbrunell/MPG/CT/`1'.csv"
> 	<snip>
> 	save "/Users/tbrunell//MPG/CT/`1'.dta", replace
> 
> macro shift
> }
> ***************!
> 
> 
> 
> - Eric
> __
> Eric A. Booth
> Public Policy Research Institute
> Texas A&M University
> [email protected]
> 
> 
> P.S.  Say "Hi" to Dave Smith for me if he's still around there.
> 
> 
> 
> 
> On Nov 2, 2010, at 2:57 PM, tbrunell wrote:
> 
>> I am doing some simple analysis on election data that spans all the states and several decades.
>> So I have hundreds of files that I want to do the same relatively simple analysis on (I have an example below).
>> At first I started writing .do files for each state/year and the only things I changed were the 
>> 1) file name for the insheet command
>> 2) the name and location of the collapsed file at the end.
>> 
>> However, when I wanted to add an additional command this meant opening hundreds of separate .do files, making a change, resaving the file.  It is not the end of the world, but I would prefer to set up the commands and then, somehow, tell stata to run the commands separately for each specified file and then save the resulting file with some new name.
>> 
>> The techs at Stata recommended using macros for file names and the foreach command.  But that doesn't solve my filename and output file problem.
>> 
>> Any recommendations would be much appreciated.
>> 
>> Tom Brunell
>> Professor of Political Science
>> University of Texas at Dallas
>> 
>> _____________________________
>> clear
>> insheet using "/Users/tbrunell/MPG/CT/mpg_09_CTC1972_1972_EDCD11_10_JH22.csv"
>> drop in L /*this drops file notation at the bottom*/
>> compress
>> 
>> gen demper=dem/(dem+rep)
>> gen demwin=.
>> replace demwin=1 if demper>.5 & demper~=.
>> replace demwin=0 if demper<.5
>> sort rkey
>> gen overalldemper=overalldem/(overalldem+overallrep)
>> 
>> *here overalldemper will be total votes percentage, demper is "normalized" vote - averaged across districts
>> collapse (count) numberofseats=demper (sum) demwin (mean) year demper overalldemper (p50) median=demper,by(rkey)
>> gen percentdemdist=demwin/numberofseats
>> 
>> save "/Users/tbrunell//MPG/CT/CTC1970s", replace




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index