Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: avoiding StatTransfer: huge / large / big dataset from SAS / csv

From   Stas Kolenikov <[email protected]>
To   [email protected]
Subject   Re: st: avoiding StatTransfer: huge / large / big dataset from SAS / csv
Date   Tue, 26 Oct 2004 12:19:24 -0400

1. have you tried 

infile ... in 1000000/1999999

or something of that kind? you can have a single .csv file for that,
read the file by pieces, and then -append- (rather than -merge-) them
together. You can even automate this with something like

infile ... in 1/999999
save piece0, replace
forvalues k=1/134 {
   infile ... in `k'000000/`k'999999
   save piece`k', repalce
use piece0
forvalues k=1/134 {
   append using piece`k'

and then just let it run in your available memory for a couple of hours.

2. check -usesas- by Dan Blanchette, it may (or may not) work, and if
it does not, you can ask him for moneyback... I mean, for an update,
explaining what worked and what did not. He has experience in both SAS
and Stata data handling, so he should be able to solve it.

3. ODBC can do it, StatTransfer or DBMS/Copy (which, unfortunately, is
also a part of SAS these days) can do it.


On Tue, 26 Oct 2004 11:23:09 -0400, Daniel Egan <[email protected]> wrote:
> Hello,
> I am trying to get a ~3 GB .csv dataset into Stata. I don't think it
> will be anywhere near 3 GB once in Stata, but there it is, on my
> computer, taunting me. It is too big to open even using a text editor.
> When I set my memory to 750M, I am able to read in nearly 7 million
> observations into Stata, and then its full.
> The original dataset is actually in EBCDIC. I used a very simple SAS
> routine to read the zoned decimal data (that is key), and then
> exported the dataset to a .csv file. I have pretty much _no_
> experience with SAS whatsoever. The only reason I got involved with it
> is because it can read zoned/packed decimal data.
> I believe I have X options to get the data into Stata, all of which
> are missing a vital step that I am not sure how to do, or have
> available:
> 1) Export the data to csv files from SAS in segments, i.e. 1st
> 1million obs, 2 millions obs etc... Then import each of these into
> Stata and merge. I am not sure how to tell SAS to sort and then export
> based on a criteria however.
> 2) Do the analogous method in Stata, but using -infile-. The problem
> is that -infile- with [in] requires the data to be in a fixed format.
> As far as I know, SAS can only export delimited. If I could export the
> data from SAS in a fixed format, that would work.
> 3) I have seen various work-arounds in Statalist/FAQs with large
> datasets using OBDC. I do not know anything about OBDC, but if its the
> only way to go, I will learn.
> 4) I know about StatTransfer, but I am not the one making decisions
> about buying new software/licenses, and don't particularly want to go
> through that if I don't have to.
> Any guidance, suggestions, or clever responses are very much appreciated.
> Regards,
> Dan
> *
> *   For searches and help try:
> *
> *
> *

Stas Kolenikov
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index