Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Reading very complex raw data files


From   austin nichols <[email protected]>
To   [email protected]
Subject   Re: st: Reading very complex raw data files
Date   Mon, 5 Dec 2005 17:06:37 -0500

Michael Mitchell's assertion is demonstrably untrue: any SAS input
statement can be rewritten in Stata syntax (possibly using -file-),
and there are some data management tasks that are much harder in SAS
(e.g. calculating the income of the person with line number defined by
a variable S_LINENO for each person).  What he is no doubt alluding to
(though I have not read the surrounding text) is a hierarchical file
format such as that used by the CPS. SAS and Stata code for reading
these files is available at http://www.nber.org/data/cps_progs.html
(and associated pages).  You can see for yourself which is
conceptually easier--I myself find the Stata code more intuitive.

The main advantage, and disadvantage, of SAS is that it reads data
serially instead of keeping it all in memory.  Thus, reading an entire
SIPP panel (see
http://www.nber.org/data/survey-of-income-and-program-participation-sipp-data.html
for details) into one big file (about 8GB of data) might well be
impossible in Stata due to memory constraints.  This is not due to
complexity so much as size, though.

On 12/2/05, Joseph Coveney <[email protected]> wrote:
> In Michael Mitchell's "Strategically using General Purpose Statistics
> Packages: A Look at Stata, SAS and SPSS," he alludes to the superior ability
> of SAS to read complex data files.  Excerpting from Page 20 of the technical
> report,
>
> "Complex raw data files
> Some raw data files are stored in a very complex format, perhaps having
> varying numbers of variables. Without a doubt, SAS is the most powerful tool
> for reading these kinds of complex data files and is the very best tool for
> reading very complex raw data files.
> Hierarchical data files
> . . . It is harder to read in such files in SAS, however you have additional
> power while reading the files in SAS. . Stata is the weakest program in this
> respect, being hard to use (probably equivalent to SAS in difficulty) but
> not offering the kind of additional power that you get in SAS."
>
> Does anyone on the List know of a publicly accessible (ideally, uncontrived)
> example of a complex data file that illustrates the advantages of SAS over
> SPSS, Stata, and other packages for reading these?  If so, could you please
> post the URL?  (Or the URL of a description of what such a data file would
> look like--perhaps something like an anecdote or case study illustrating
> the power of the DATA step with a particularly nasty example that some SAS
> user encountered.)
>
> I couldn't locate anything pertinent via the customary search engines.  I'm
> not referring to EBCDIC, XML (or even SAS 6.04 dataset files, apparently,
> for that matter), but rather a file with a data organizational complexity
> that illustrates what Michael is talking about.  Thank you.
>
> Joseph Coveney
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index