Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: Precision in outsheet and outfile |
Date | Mon, 28 Feb 2011 19:03:24 -0000 |
Hi all. A Statalist reader (but not subscriber) wrote to me about his problems with -outsheet- and precision, and suggested a work-around, namely -xmlsave-: > Dear Mark, > > I ran into the same problem with Stata with precision and outsheet > that you noted in the Statalist. I couldn't figure out how to reply > to the list, so I am writing directly. The loss of precision and the > lack of documentation is a major problem in my view. > > I was able to save with better precision using the xmlsave command. > > Let me know if there are other work-arounds that you hear about. > > Regards, > > Andrew Austin, Ph.D. > Congressional Research Service Andrew's main point is that the behaviour of xmluse/xmlsave is the behaviour we both expected - but we now know isn't there - from insheet/outsheet, and that these important differences could benefit from explicit discussion in the Stata documentation for these commands (and I agree). As a follow-up, I am curious to know more about how xmluse/xmlsave maintain precision. (I realize this is dangerously close to reopening the double-debate!) It's not discussed in the manual. Does anyone know? --Mark > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of > Schaffer, Mark E > Sent: 11 January 2011 13:40 > To: statalist@hsphsun2.harvard.edu > Subject: st: Precision in outsheet and outfile > > Hi all. I think I've just been bitten by an (almost) > undocumented "feature" of outsheet (and shared by outfile): > storage precision is determined by the display format. > > I'm using Stata 11.1 for Windows. Stata 10.1 for Windows > behaves the same way. > > For example, in my original data the default display format is %9.0g. > If I change the format of the relevant variable to %12.0f, > and then outsheet and insheet, everything is fine: > > . format GDP %12.0f > > . > . desc GDP > > storage display value > variable name type format label variable label > ------------------------------------------------------------ > GDP double %12.0f > > . > . list > > +-----------------+ > | Year GDP | > |-----------------| > 1. | 1995 9963191 | > 2. | 1996 10335489 | > +-----------------+ > > . > . outsheet using testoutfile.csv, replace comma > > . > . insheet using testoutfile.csv, clear case > (2 vars, 2 obs) > > . > . list > > +-----------------+ > | Year GDP | > |-----------------| > 1. | 1995 9963191 | > 2. | 1996 10335489 | > +-----------------+ > > But if I don't change the display format, numbers >999,999 > lose all but > 3 (!!) digits of precision: > > . desc GDP > > storage display value > variable name type format label variable label > ------------------------------------------------------------ > GDP double %9.0g > > . > . list > > +-----------------+ > | Year GDP | > |-----------------| > 1. | 1995 9963191 | > 2. | 1996 1.03e+07 | > +-----------------+ > > . > . outsheet using testoutfile.csv, replace comma > > . > . insheet using testoutfile.csv, clear case > (2 vars, 2 obs) > > . > . list > > +-----------------+ > | Year GDP | > |-----------------| > 1. | 1995 9963191 | > 2. | 1996 10300000 | > +-----------------+ > > > This behaviour seems to be shared by outfile, even though I'm > using Stata's dictionary to specify the datatype: > > . desc GDP > > storage display value > variable name type format label variable label > ------------------------------------------------------------ > GDP double %9.0g > > . > . list > > +-----------------+ > | Year GDP | > |-----------------| > 1. | 1995 9963191 | > 2. | 1996 1.03e+07 | > +-----------------+ > > . > . outfile using testoutfile.csv, replace dict > > . > . infile using testoutfile.csv, clear > > dictionary { > int Year `"Year"' > double GDP > } > > (2 observations read) > > . > . list > > +-----------------+ > | Year GDP | > |-----------------| > 1. | 1995 9963191 | > 2. | 1996 10300000 | > +-----------------+ > > So even though Stata's dictionary format notes that GDP is a > double, all but 3 digits of precision are lost. > > What's happening is that with the default display width of 9 > digits, after 999,999 Stata switches to exponential notation, > so it records the > 1996 value above as 1.03e+07. > > There's no direct mention of this limitation in the > documentation for outsheet. There is something about this in > the manual documentation for outfile, but I had to read > between the lines to work out the > implications: > > "Numeric variables are output right-justified in the field > width specified by their display format." > > The implications for precision follow from this, but I think > I can be forgiven for missing it. > > I'm posting to the list because I think it's important enough > to bring to people's attention. If others feel similarly, > perhaps StataCorp can update the online documentation and > manual to point this out, or even add options to outsheet and > outfile to control precision independently of formatting. > > --Mark > > > -- > Heriot-Watt University is a Scottish charity registered under > charity number SC000278. > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Heriot-Watt University is a Scottish charity registered under charity number SC000278. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/