[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Data Management Tool
Nick Cox had some valuable responses to these questions.
I would add...
Stata is a RAM-only system; it does not pipeline large datasets. At
least, as far as I know, this is correct. Thus, you need large
amounts of memory to work with large datasets. If your datasets are
really huge (and if I'm correct that Stata can't pipeline), then this
may be a limitation. I believe that Stata's choice to use an
all-in-memory method is based on the idea that, while datasets can be
large, memory keeps getting more cheap and plentiful.
I'm not sure about querying datasets in foreign formats (Oracle,
mysql). You usually convert them to Stata before using them. Thus,
there is a static aspect to that operation. That is, you don't get
to query a live system -- one that is subject to changes by other
users while you are accessing it. You get to access a snapshot of a
system, as of the moment the data were extracted.
But then, there is the odbc set of facilities. I'm not familiar with
them, but they do allow access to some foreign-format data. (Whether
it allows access to live systems, I don't know.)
Finally, I do know of one large commercial product for data
processing & reporting that has been built using Stata as its
core. This is produced by Prof-soft Health. You may want to contact
the person who put this together; that is Ed Bassin,
I hope this helps; Good Luck.
At 10:42 AM 2/6/2007, Marc wrote:
I'm looking into Stata MP as a data management system on linux for 3-5 users.
The main tasks of the data management system would be to query
tables from Oracle, mysql, create variables, run statistical
analysis/modeling and support fairly advanced reporting.
I've worked with SAS and SPSS, however this time around I'd be the
one paying for the license, and am being consciencious about
shopping for the best solution. The corporate reflex of "just
writting a PO" for SAS or SPSS server is not the chosen approach here.
- How does Stata compare as data management tool, specially as it
interfaces with databases?
- Is it a RAM-only system, or does it pipeline large datasets ?
- What are its limitations as a data management tool ?
- At what point would you say SAS/SPSS is hands down the best
solution (if at all)?
- Any opinion on its syntax/batch capabilities?
I know next to nothing about Stata's macro capabilities, and one
thing I like about SPSS is its new Python extensibility. Anything
comparable with Stata ?
Many thanks for your time.
* For searches and help try: