[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Nick Winter <nw53@cornell.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: avoiding StatTransfer: huge / large / big dataset from SAS / csv |

Date |
Tue, 26 Oct 2004 12:23:41 -0400 |

One approach is to use Stata to split up the giant CSV file into chunks, using the -file- command. The program pasted below should do it:

program splitmyfiles

* splitmyfiles infilename outputstub chunk_size

version 8.2

args input outstub size

tempname in out

file open `in' using `input' , read text

qui file open `out' using `outstub'_1.csv , write text replace

local fnum 1

local i 1

file read `in' line

while !r(eof) {

file write `out' `"`line'"' _n

local ++i

if !mod(`i'-1,`size') {

file close `out'

local ++fnum

qui file open `out' using `outstub'_`fnum'.csv , write text replace

di "." _c

}

file read `in' line

}

file close `in'

file close `out'

end

The syntax would be something like:

. splitmyfiles rawdata.csv piece 1000000

This would take "rawdata.csv" and split it into piece_1.csv, piece_2.csv, etc., each with 1 million lines.

There may be better ways, of course.

--NW

At 11:23 AM 10/26/2004 -0400, you wrote:

--------------------------------------------------------Hello, I am trying to get a ~3 GB .csv dataset into Stata. I don't think it will be anywhere near 3 GB once in Stata, but there it is, on my computer, taunting me. It is too big to open even using a text editor. When I set my memory to 750M, I am able to read in nearly 7 million observations into Stata, and then its full. The original dataset is actually in EBCDIC. I used a very simple SAS routine to read the zoned decimal data (that is key), and then exported the dataset to a .csv file. I have pretty much _no_ experience with SAS whatsoever. The only reason I got involved with it is because it can read zoned/packed decimal data. I believe I have X options to get the data into Stata, all of which are missing a vital step that I am not sure how to do, or have available: 1) Export the data to csv files from SAS in segments, i.e. 1st 1million obs, 2 millions obs etc... Then import each of these into Stata and merge. I am not sure how to tell SAS to sort and then export based on a criteria however. 2) Do the analogous method in Stata, but using -infile-. The problem is that -infile- with [in] requires the data to be in a fixed format. As far as I know, SAS can only export delimited. If I could export the data from SAS in a fixed format, that would work. 3) I have seen various work-arounds in Statalist/FAQs with large datasets using OBDC. I do not know anything about OBDC, but if its the only way to go, I will learn. 4) I know about StatTransfer, but I am not the one making decisions about buying new software/licenses, and don't particularly want to go through that if I don't have to. Any guidance, suggestions, or clever responses are very much appreciated. Regards, Dan * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Nicholas Winter 607.255.8819 t

Assistant Professor 607.255.4530 f

Department of Government nw53@cornell.edu e

308 White Hall falcon.arts.cornell.edu/nw53 w

Cornell University

Ithaca, NY 14853-4601

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**References**:**st: avoiding StatTransfer: huge / large / big dataset from SAS / csv***From:*Daniel Egan <dp.egan@gmail.com>

- Prev by Date:
**Re: st: avoiding StatTransfer: huge / large / big dataset from SAS/ csv** - Next by Date:
**Re: st: ordered probit and panels** - Previous by thread:
**Re: st: avoiding StatTransfer: huge / large / big dataset from SAS/ csv** - Next by thread:
**Re: st: avoiding StatTransfer: huge / large / big dataset from SAS/ csv** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |