Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: avoiding StatTransfer: huge / large / big dataset from SAS / csv


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: avoiding StatTransfer: huge / large / big dataset from SAS / csv
Date   Tue, 26 Oct 2004 12:19:24 -0400

1. have you tried 

infile ... in 1000000/1999999

or something of that kind? you can have a single .csv file for that,
read the file by pieces, and then -append- (rather than -merge-) them
together. You can even automate this with something like

clear
infile ... in 1/999999
save piece0, replace
forvalues k=1/134 {
   clear
   infile ... in `k'000000/`k'999999
   save piece`k', repalce
}
clear
use piece0
forvalues k=1/134 {
   append using piece`k'
}

and then just let it run in your available memory for a couple of hours.

2. check -usesas- by Dan Blanchette, it may (or may not) work, and if
it does not, you can ask him for moneyback... I mean, for an update,
explaining what worked and what did not. He has experience in both SAS
and Stata data handling, so he should be able to solve it.

3. ODBC can do it, StatTransfer or DBMS/Copy (which, unfortunately, is
also a part of SAS these days) can do it.

Stas

On Tue, 26 Oct 2004 11:23:09 -0400, Daniel Egan <dp.egan@gmail.com> wrote:
> Hello,
> 
> I am trying to get a ~3 GB .csv dataset into Stata. I don't think it
> will be anywhere near 3 GB once in Stata, but there it is, on my
> computer, taunting me. It is too big to open even using a text editor.
> When I set my memory to 750M, I am able to read in nearly 7 million
> observations into Stata, and then its full.
> 
> The original dataset is actually in EBCDIC. I used a very simple SAS
> routine to read the zoned decimal data (that is key), and then
> exported the dataset to a .csv file. I have pretty much _no_
> experience with SAS whatsoever. The only reason I got involved with it
> is because it can read zoned/packed decimal data.
> 
> I believe I have X options to get the data into Stata, all of which
> are missing a vital step that I am not sure how to do, or have
> available:
> 
> 1) Export the data to csv files from SAS in segments, i.e. 1st
> 1million obs, 2 millions obs etc... Then import each of these into
> Stata and merge. I am not sure how to tell SAS to sort and then export
> based on a criteria however.
> 
> 2) Do the analogous method in Stata, but using -infile-. The problem
> is that -infile- with [in] requires the data to be in a fixed format.
> As far as I know, SAS can only export delimited. If I could export the
> data from SAS in a fixed format, that would work.
> 
> 3) I have seen various work-arounds in Statalist/FAQs with large
> datasets using OBDC. I do not know anything about OBDC, but if its the
> only way to go, I will learn.
> 
> 4) I know about StatTransfer, but I am not the one making decisions
> about buying new software/licenses, and don't particularly want to go
> through that if I don't have to.
> 
> Any guidance, suggestions, or clever responses are very much appreciated.
> 
> Regards,
> Dan
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Stas Kolenikov
http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index