[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: avoiding StatTransfer: huge / large / big dataset from SAS / csv
I am trying to get a ~3 GB .csv dataset into Stata. I don't think it
will be anywhere near 3 GB once in Stata, but there it is, on my
computer, taunting me. It is too big to open even using a text editor.
When I set my memory to 750M, I am able to read in nearly 7 million
observations into Stata, and then its full.
The original dataset is actually in EBCDIC. I used a very simple SAS
routine to read the zoned decimal data (that is key), and then
exported the dataset to a .csv file. I have pretty much _no_
experience with SAS whatsoever. The only reason I got involved with it
is because it can read zoned/packed decimal data.
I believe I have X options to get the data into Stata, all of which
are missing a vital step that I am not sure how to do, or have
1) Export the data to csv files from SAS in segments, i.e. 1st
1million obs, 2 millions obs etc... Then import each of these into
Stata and merge. I am not sure how to tell SAS to sort and then export
based on a criteria however.
2) Do the analogous method in Stata, but using -infile-. The problem
is that -infile- with [in] requires the data to be in a fixed format.
As far as I know, SAS can only export delimited. If I could export the
data from SAS in a fixed format, that would work.
3) I have seen various work-arounds in Statalist/FAQs with large
datasets using OBDC. I do not know anything about OBDC, but if its the
only way to go, I will learn.
4) I know about StatTransfer, but I am not the one making decisions
about buying new software/licenses, and don't particularly want to go
through that if I don't have to.
Any guidance, suggestions, or clever responses are very much appreciated.
* For searches and help try: