Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Data Management Tool

From   David Kantor <>
Subject   Re: st: Data Management Tool
Date   Tue, 06 Feb 2007 13:23:36 -0500

Nick Cox had some valuable responses to these questions.

I would add...
Stata is a RAM-only system; it does not pipeline large datasets. At least, as far as I know, this is correct. Thus, you need large amounts of memory to work with large datasets. If your datasets are really huge (and if I'm correct that Stata can't pipeline), then this may be a limitation. I believe that Stata's choice to use an all-in-memory method is based on the idea that, while datasets can be large, memory keeps getting more cheap and plentiful.

I'm not sure about querying datasets in foreign formats (Oracle, mysql). You usually convert them to Stata before using them. Thus, there is a static aspect to that operation. That is, you don't get to query a live system -- one that is subject to changes by other users while you are accessing it. You get to access a snapshot of a system, as of the moment the data were extracted.

But then, there is the odbc set of facilities. I'm not familiar with them, but they do allow access to some foreign-format data. (Whether it allows access to live systems, I don't know.)

Finally, I do know of one large commercial product for data processing & reporting that has been built using Stata as its core. This is produced by Prof-soft Health. You may want to contact the person who put this together; that is Ed Bassin,

I hope this helps; Good Luck.

At 10:42 AM 2/6/2007, Marc wrote:

Hi All,

I'm looking into Stata MP as a data management system on linux for 3-5 users.

The main tasks of the data management system would be to query tables from Oracle, mysql, create variables, run statistical analysis/modeling and support fairly advanced reporting.

I've worked with SAS and SPSS, however this time around I'd be the one paying for the license, and am being consciencious about shopping for the best solution. The corporate reflex of "just writting a PO" for SAS or SPSS server is not the chosen approach here.

- How does Stata compare as data management tool, specially as it interfaces with databases?

- Is it a RAM-only system, or does it pipeline large datasets ?

- What are its limitations as a data management tool ?

- At what point would you say SAS/SPSS is hands down the best solution (if at all)?

- Any opinion on its syntax/batch capabilities?

I know next to nothing about Stata's macro capabilities, and one thing I like about SPSS is its new Python extensibility. Anything comparable with Stata ?

Many thanks for your time.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index