Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: avoiding StatTransfer: huge / large / big dataset from SAS/ csv


From   jean ries <ries@ires.ucl.ac.be>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: avoiding StatTransfer: huge / large / big dataset from SAS/ csv
Date   Tue, 26 Oct 2004 18:48:08 +0200

Daniel Egan wrote:

Hello,
I am trying to get a ~3 GB .csv dataset into Stata. I don't think it
will be anywhere near 3 GB once in Stata, but there it is, on my
computer, taunting me. It is too big to open even using a text editor.
When I set my memory to 750M, I am able to read in nearly 7 million
observations into Stata, and then its full.

The original dataset is actually in EBCDIC. I used a very simple SAS
routine to read the zoned decimal data (that is key), and then
exported the dataset to a .csv file. I have pretty much _no_
experience with SAS whatsoever. The only reason I got involved with it
is because it can read zoned/packed decimal data.

I believe I have X options to get the data into Stata, all of which
are missing a vital step that I am not sure how to do, or have
available:

1) Export the data to csv files from SAS in segments, i.e. 1st
1million obs, 2 millions obs etc... Then import each of these into
Stata and merge. I am not sure how to tell SAS to sort and then export
based on a criteria however.

I think you can tackle the problem from within SAS. There might be better (and more elgant) ways, but this might work for you. It involves some copy-paste. I am not aware of a kind of "foreach" or "forval" loops that would handle this in SAS.

good luck!

jean

---------- BEGIN big2small.sas ----------
/* You can sort the data in the following way . But I think you don't need to */
/* replace "libname" and "big" with the names that apply in your case */
proc sort data = libname.big;
by sortvar;

/* create the small datasets one by one */
data small1;
set big(firstobs = 1 obs = 1000000); /* observations 1 to 1000000 */

data small2;
set big(firstobs = 1000001 obs = 2000000); /* observations 1000001 to 2000000 */

data small3;
set big(firstobs = 2000001 obs = 3000000); /* observations 2000001 to 3000000 */

/* and so on ... */

/* and now export the "small" datasets in .csv format */
proc export
data = small1
outfile = "c:\...\small1.csv"
dbms = csv
replace;

/* and so on ... */

run;
/* and now back to Stata ... :) */
---------- END big2small.sas ----------

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index