Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Concatenate files


From   Eric Booth <eric.a.booth@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Concatenate files
Date   Sat, 31 Mar 2012 14:40:33 -0500

<>



<>

1.  -append- will stack/combine together variables from 2 datasets with the same name (variables with different names won't stack together).  You could write a loop to open all the .csv files and rename the variables depending on the # of variables (or columns) in the file.  Then when you append them, they will stack/append properly.  (If the variable format is an issue, you can read the file into memory (-insheet-) and then -save- a Stata version with the variables all stored as string to ensure that variables with the same names append properly -- or use the 'force' option in your -append- command).

2. A second comment about your example code is that you use -dir- to build a filelist and then read the .csv file names from it -- you can do this much more directly using the macro extended function (-help extended_fcn-) 'dir'.  

Below is some (untested) code to append depending on the # of variables in the file (since you say that is the decision rule for how these variables are stacked) by using the returned value from -describe- and the macro extended functions to build your file list.
This code assumes that in your files with two vars you've got the variables 'var1' and 'var2' and in the files with one var you want to rename the single variable 'var1' since you say that you want that variable to append to the first column in the 2 variable files.  (You can change this decision rule below if that's not what you meant)  Note that I -tostring- all the vars for the -append-; you may not need this step.


**********************!

clear
sa "master.dta", replace emptyok  //here is your 'master' file

**build the filelist ==>
local files: dir `"`c(pwd)'"' files "*.csv", respectcase nofail
di `"`files'"' //this is your file list
local files: subinstr local files ".csv" "", all
di `"`files'"' //scrubs the extension so that we can use it below

****loop over file list & append==>
foreach f in `files' {
  *--this part will clean the csv file and make a stata version:
	preserve
		insheet using "`f'.csv", comma clear names
		qui desc
		di `"`r(k)'"' //contains # of vars
		if `r(k)' == 1 {
			ds
			rename `r(varlist)' var1
			}
		tostring *, replace
		sa "`f'.dta", replace // you'll use this for the append
	restore
  *--now append the cleaned file to the master:
	append using "`f'.dta"
	save "master.dta", replace
	} //go back to the top and clean, save, append next file..


**here's your appended master file==>	
u "master.dta", clear
desc
**********************!



- Eric

__
Eric A. Booth
Public Policy Research Institute 
Texas A&M University
ebooth@ppri.tamu.edu
+979.845.6754

On Mar 31, 2012, at 8:59 AM, Huaandrew wrote:

> Hello everybody,
> 
> I am using Stata 12 to create a large master file which is created by appending 3,000 csv files. The problem is that the 3,000 files have two kinds of structure (mixed together and no way to separate them by file names): one has only one column and the other two columns. For the two-column files, the first column is character variable, while the second column is numeral(character numeric), such as "12,356". For the one-column files, the column is numeral as in the two-column file. What Stata commands I should use to distinguish the two kinds of files when appending them into the master file. What I'd like to do is that if it's one-column file, let Stata create a new column which is just like the first column in the two-column file.The original code without distinguishing the two kinds of files is as follows. When I ran it, Stata clashed when reading the first one-column file after reading several two-column files. 
> 
> cd d:\Stata_Data\found! 
> dir *.csv /a-d /b>d:\Stata_Data\filelist.txtfile 
> open myfile using d:\Stata_Data\filelist.txt, read
> file read myfile line
> insheet using `line', comma
> gen hpl_id = substr("`line'",1,22)
> save `line'.dta, replace
> save master_data.dta, replace
> drop _all 
> file read myfile line
> while r(eof)==0{
> insheet using `line', comma
> gen hpl_id = substr("`line'",1,22)
> save `line'.dta, replace
> append using master_data.dta
> save master_data.dta, replace
> drop _all
> file read myfile line}
> 
> Thanks!
> 
> Andrew  		 	   		  
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index