Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: repeat same commands over hundreds of files


From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: repeat same commands over hundreds of files
Date   Tue, 2 Nov 2010 20:22:47 +0000

<>

Hi Tom:

The best approach probably depends on how your file names are sequenced and how your folders/files are organized, but programs like -fs- (from SSC) and others are useful for this type of work.  Here's two approaches:


assuming you've got files named sequentially like this:

mpg_09_CTC1972_1972_EDCD11_10_JH22
mpg_09_CTC1973_1973_EDCD11_10_JH22
mpg_09_CTC1974_1974_EDCD11_10_JH22
mpg_09_CTC1975_1975_EDCD11_10_JH22
mpg_09_CTC1981_1981_EDCD11_10_JH22
mpg_09_CTC1982_1982_EDCD11_10_JH22



You could use a -forvalues- loop like:

*********!
forval n = 1972/1981 {
clear
insheet using "/Users/tbrunell/MPG/CT/mpg_09_CTC`n'_`n'_EDCD11_10_JH22.csv"
drop in L /*this drops file notation at the bottom*/
compress
gen demper=dem/(dem+rep)
gen demwin=.
replace demwin=1 if demper>.5 & demper~=.
replace demwin=0 if demper<.5
sort rkey
gen overalldemper=overalldem/(overalldem+overallrep)
collapse (count) numberofseats=demper (sum) demwin (mean) year demper overalldemper (p50) median=demper,by(rkey)
gen percentdemdist=demwin/numberofseats


**create a macro for the decade**
local save
if inrange(`n', 1970, 1979) local save 1970
if inrange(`n', 1980, 1989) local save 1980 


save "/Users/tbrunell//MPG/CT/CTC`save's", replace
}
************!

Note the use of the local macros to create the decade for the -save- filename.



Another approach is to just find all the .csv files in your folder (or alternatively this could be done to find all the folders of interest and all the .csv files in all the folders of interest) using the macro extended functions (see -help extended_fcn-)  and run the code on all of them , e.g., 

*************!
global files:dir "<folder path>" files "*.csv", respectcase
token `"$files"'
di in yellow `"$files"'

while "`1'" != "" {
	clear
	insheet using "/Users/tbrunell/MPG/CT/`1'.csv"
	<snip>
	save "/Users/tbrunell//MPG/CT/`1'.dta", replace

macro shift
}
***************!



- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]


P.S.  Say "Hi" to Dave Smith for me if he's still around there.




On Nov 2, 2010, at 2:57 PM, tbrunell wrote:

> I am doing some simple analysis on election data that spans all the states and several decades.
> So I have hundreds of files that I want to do the same relatively simple analysis on (I have an example below).
> At first I started writing .do files for each state/year and the only things I changed were the 
> 1) file name for the insheet command
> 2) the name and location of the collapsed file at the end.
> 
> However, when I wanted to add an additional command this meant opening hundreds of separate .do files, making a change, resaving the file.  It is not the end of the world, but I would prefer to set up the commands and then, somehow, tell stata to run the commands separately for each specified file and then save the resulting file with some new name.
> 
> The techs at Stata recommended using macros for file names and the foreach command.  But that doesn't solve my filename and output file problem.
> 
> Any recommendations would be much appreciated.
> 
> Tom Brunell
> Professor of Political Science
> University of Texas at Dallas
> 
> _____________________________
> clear
> insheet using "/Users/tbrunell/MPG/CT/mpg_09_CTC1972_1972_EDCD11_10_JH22.csv"
> drop in L /*this drops file notation at the bottom*/
> compress
> 
> gen demper=dem/(dem+rep)
> gen demwin=.
> replace demwin=1 if demper>.5 & demper~=.
> replace demwin=0 if demper<.5
> sort rkey
> gen overalldemper=overalldem/(overalldem+overallrep)
> 
> *here overalldemper will be total votes percentage, demper is "normalized" vote - averaged across districts
> collapse (count) numberofseats=demper (sum) demwin (mean) year demper overalldemper (p50) median=demper,by(rkey)
> gen percentdemdist=demwin/numberofseats
> 
> save "/Users/tbrunell//MPG/CT/CTC1970s", replace
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index