Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: large data sets (was st: A faster way to gsort)

From	Jeph Herrin <[email protected]>
To	[email protected]
Subject	Re: large data sets (was st: A faster way to gsort)
Date	Thu, 13 Mar 2014 09:29:09 -0400


On 3/12/2014 11:54 PM, Joseph Coveney wrote:



As for #1, wouldn't additional RAM be cheaper than a SAS license?  And if
you're maxed-out on memory slots, wouldn't even a more powerful workstation be
cheaper than a SAS license?


My institutional SAS 9.4 license runs me $49, so no.

More pointedly, in this situation, I must work remotely (because thedatabase is on the order of several TB, and for data security reasons),so I don't have a lot of control over the environment.

I don't quite follow #3.  Aren't Stata's data management operations
incremental?  I find a series of Stata's data management commands much easier to
walk through than a single SQL statement stretching for pages.

I wasn't very clear here. But when working with a >1TB database, it'snot practical to do everything in either SAS or Stata. But to *avoid*writing pages of SQL one wants to submit a query that (say) pulls down alist of identifiers, then submit a second query that uses that list ofidentifiers to pull down related records. To do this second step inStata, one would need to be able to write SQL that referenced a Statafile. The alternative to this incremental approach would be to writeunreadable SQL queries.

Obviously, we all have different wants and expections from Stata. Forme, this is the first 'big data' application I've had for Stata, and ithasn't done well; I have other 'big data' proposals coming up, andunfortunately I'm going to have to hedge my endorsement of Stata forthis kind of work.


cheers,
J

As for Stata's doing SQL natively, there is a comment to a post on the Stata
Blog similarly calling for Stata to adopt SQL standard syntax.  I know that
Jeff's comment goes beyond that, almost as if to have an ODBC driver or
OLE DB provider for Stata dataset files.

I like SQL and use it daily, but I wouldn't want StataCorp to expend its finite
development resources in that direction.  I say this for a number of reasons
(for a couple of examples:  the three-valued logic of NULLs and other
peculiarities of SQL; considerations of when ad hoc SQL queries should be
permitted and where upstream data management operations should be manifest for
reasons of efficiency, security and regulatory compliance).

So, if there's a wish-list poll somewhere for Stata 14, put me down as against
SQL in favor of, say, -strunicode-, -menl-, -mcmc- or something along those
lines.

Joseph Coveney

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: large data sets (was st: A faster way to gsort)
  - From: "Joseph Coveney" <[email protected]>
- RE: large data sets (was st: A faster way to gsort)
  - From: Andrew Maurer <[email protected]>

References:
- st: A faster way to gsort
  - From: Andrew Maurer <[email protected]>
- Re: st: A faster way to gsort
  - From: Maarten Buis <[email protected]>
- RE: st: A faster way to gsort
  - From: Joe Canner <[email protected]>
- RE: st: A faster way to gsort
  - From: Joe Canner <[email protected]>
- RE: st: A faster way to gsort
  - From: Joe Canner <[email protected]>
- Re: st: A faster way to gsort
  - From: Nick Cox <[email protected]>
- RE: st: A faster way to gsort
  - From: Joe Canner <[email protected]>
- large data sets (was st: A faster way to gsort)
  - From: Jeph Herrin <[email protected]>
- Re: large data sets (was st: A faster way to gsort)
  - From: "Joseph Coveney" <[email protected]>

Prev by Date: Re: st: From: Anna Zakharova <[email protected]>
Next by Date: RE: st: do loops and mata
Previous by thread: Re: large data sets (was st: A faster way to gsort)
Next by thread: RE: large data sets (was st: A faster way to gsort)
Index(es):
- Date
- Thread