[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Data Management Tool
Stata can query "live" transactional systems using ODBC, but the
advisability of that is doubtful since most large databases are
heavily normalized and the SQL and processing required to return the
required flatfile can be extensive. In many shops periodic snapshots
are generated for a data warehouse during periods of decreased server
load and the ODBC SQL is run against the snapshot. The snapshot can
be fairly concise if there are also periodic imports of value labels
for the unique keys that are linked to the main transaction table.
This approach provides a much smaller memory footprint than importing
strings in the flatfile. Stata can also load portions of a dataset,
subsetting either observations or variables. The tradeoff is ease and
speed when dealing with anything less than huge datasets. I routinely
work on 1,000,000+ row x 30 col datasets on a machine with 1GB RAM.
As Nick has pointed out, Mata is the future of Stata as it evolves
toward running compiled object-oriented code.
I find Stata's dialogs difficult to program (but I have been spoiled
by visual builders in other products) and the process of updating
objects based on user choices to be less than straightforward. If you
are planning on building custom interfaces, be prepared for a steep
As for syntax, I recently rewrote someone else's routine from SPSS to
Stata and it took about 80% less code (and worked, to boot).
As for batch processing, that is handled by Stata do-files which are
command scripts that allow looping but very limited syntax handling.
Do-files can be run silently or with echoed commands and console
Automatic do-files, or ado-files in Stata parlance, are roughly
equivalent to SAS Macros and are used to implement the majority of
Stata commands. They are similar to do-files, but have robust syntax
checking, subroutines, and can implement Stata's new objects and
I think, license-wise, Stata/SE vs Stata/MP needs to be decided on
both the size of dataset and type of analysis you would be doing.
Some operations are essentially serial and parallelism won't help
them. So, while some routines show significant performance gains with
MP, your milage may vary, and you have to question whether the
incremental benefit per CPU is worth it. For more on MP, see:
David Elliott MD, MSc
* For searches and help try: