Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problems in load large data or read several fields from CSV data


From   "David Elliott" <[email protected]>
To   [email protected]
Subject   Re: st: Problems in load large data or read several fields from CSV data
Date   Tue, 20 Jan 2009 14:44:11 -0400

You may wish to use a CSV file editor.  I have used the following:

http://csved.sjfrancke.nl/index.html  (slow site, be patient)

It allows specific columns or record ranges to be saved to a new file.

Alternatively, I have an ado program that will break a file into
chunks by rows with a syntax like:

-chunky using filename1, index(1) chunk(1) saving(varnames.txt) list-
would show the list of variable names if they are stored in the first
row of a file

-chunky using filename1, index(1) chunk(100) saving(filename2)-
would save the first 100 rows of a file.

(An early version can be picked up from:
http://www.stata.com/statalist/archive/2008-10/msg00048.html
but I have made revisions and the dofile below relies on the modified version.

Contact me offlist if you want the most recent version. )

One can then create a program to loop through creating and then
appending pieces of the file:

*------------begin dofile-----------*
// Do file using chunky.ado to piece together parts of a very large file
// Pay particular attention to the edit points for infile and chunksize and keep

// edit VeryLargeFile.csv on the following line to point to your file
local infile VeryLargeFile.csv

// edit to size of chunk you want
local chunksize 10000

// Get just the first line if it has variable names
chunky using `"`infile'"', index(1) chunk(1) saving("varnames.csv",
replace) list

local chunk 1
local nextrow 2
while !`r(eof)' {  // aborts loop when end of file is reached
	tempfile chunkfile chunkappend
	chunky using `"`infile'"', ///
	  index(`r(index)') chunk(`chunksize') saving("`chunkfile'")
	if `r(eof)' {
		continue, break
		}
		else {
			local nextrow `=`r(index)'+1'
			}
	// use a shell command to append the varnames with the chunk
	!copy varnames.csv+`chunkfile' `chunkappend'

	// edit the following to conform to your csv delimiter
	insheet using "`chunkappend'", clear tab names

	// edit the following to keep specific variables
	keep *

	// save part of file and increment chunk count
	save part`chunk++', replace
	}
// Append parts together
local nparts `=`chunk'-1'
use part1, clear
forvalues i=2/`nparts' {
	append using part`i'
	// uncomment the following line to erase the parts
	// erase part`i'
	}
describe
*------------end dofile-----------*

If satisfied with the import, you should save with a different name
and then delete all the part##.dta files that are left behind or
uncomment the line erase part`i'

I've used this program to good effect importing very large csv data dumps.

DC Elliot
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index