Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Support for Standard Scientific Formats; Approaches?


From   James Sams <sams.james@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Support for Standard Scientific Formats; Approaches?
Date   Fri, 22 Jun 2012 11:31:06 -0500

I have a significantly sized dataset ( ~ 4 TiB, growing at ~ 600 GiB per
year). I must use this data with both Stata as well as other software
as Stata alone
cannot handle it. While using a database is not out of the question,
there is a lot
of overhead involved there, and the people in my group are not familiar
with SQL and are likely reluctant to learn it. Further, based on some
preliminary
research, it seems that Stata's interaction with ODBC is not so fast
(rather expected) and has issues releasing memory (very problematic).
However, storing in Stata's specific format due to minimal/poor support
in most other environments. Of course, storing the dataset for each of the
environments I use is not tenable due to its size.

Thus, I have begun looking for standard scientific formats that have a wide
range of support from various programming languages and statistical
packages. The formats that seem to have a relatively wide range of
support are HDF, NetCDF, and XDR. However, it seems that Stata
is an outlier in that it does not support any of these formats. If anyone has
done any work in this area, please point me to it. Otherwise, can someone
point me to the resources required to load external data into Stata. Ideally,
it would be great if I could just write the parser (or use someone else's) to
get the data into a matrix or something and make use of the routines used by
use/merge/append/save. However, as far as I can tell, those are not well
documented externally. That said, I've not worked with Mata much, and I
suspect the answer lies there.

--
James Sams
773-315-0810
sams.james@gmail.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index