Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: RE: st: A bug in egen and gen?

From	Christopher Baum <[email protected]>
To	<[email protected]>
Subject	re: RE: st: A bug in egen and gen?
Date	Fri, 18 Feb 2011 09:12:23 -0500

<>
I would just add one thing to this discussion. I somehow doubt that the original poster is working with very large (million-obs+) datasets with hundreds or thousands of variables. There are those of us who encounter and struggle with such datasets. He is correct in noting that advances in readily available technology remove a lot of constraints: no reason nowadays to work with a 32-bit operating system (especially since those that are are really lame), no reason to have less than 4 Gb of RAM or 0.5 terabyte hard disk, etc. on even a relatively inexpensive new machine.

BUT 4 Gb of RAM is not enough to analyze a number of commonly-used social science and finance data sets, even in their most parsimonious form, on any operating system supported by Stata. And disk space, while plentiful, should not be consumed without concern for the fact that reading and writing a .dta file that is possibly twice as large is quite a bit slower. Computers' speed improvements have not so readily extended to input/output, and until solid-state hard disks are ubiquitous and cheap, that's not going to happen without paying quite a bit more for a machine. His suggestion to automatically -compress- every time you use -save- would make that operation very tedious in a context where thousands of variables have to be evaluated. So there are good reasons for having a program that allows you to read and save floating-point numbers in single precision, especially when the innate precision of any number in, e.g., the national income accounts can be readily represented by !
a single-precision ("float"). StataCorp's choice for Mata was to represent all numeric variables as doubles, but then I do not usually move my whole data set into Mata matrices.

Whether float or double should be the default data type is a matter of preference, and you are free to exercise that preference. If you work with relatively small data sets, you might well want to set precision to double as the default. For many of us who work with very large data sets, it would be a disastrous choice. What works very well for some users will not work well for others, and many Stata users face resource constraints: they cannot readily get a machine with 8 Gb or more of RAM, or a larger hard disk---or even an upgrade to Stata 11! Keep in mind that not all users have ready access to the latest and greatest that the computer industry has to offer, but they still want to take advantage of Stata.

Kit

PS> On the subject of egen and its alleged deficiencies: -egen- is pure ado-file code, of the sort that anyone can write. If the complainer wants to write his own improved version of a program that does what -egen- does, but does it to his liking, he is free to do so and share it via SSC with other users.

Kit Baum | Boston College Economics & DIW Berlin | http://ideas.repec.org/e/pba1.html
An Introduction to Stata Programming | http://www.stata-press.com/books/isp.html
An Introduction to Modern Econometrics Using Stata | http://www.stata-press.com/books/imeus.html

*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: RE: st: A bug in egen and gen?
  - From: "Liao, Junlin" <[email protected]>

Prev by Date: Antwort: st: re: Sargan test
Next by Date: Re: st: Query on "predict"
Previous by thread: RE: st: A bug in egen and gen?
Next by thread: RE: RE: st: A bug in egen and gen?
Index(es):
- Date
- Thread