Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Re: Stata appears to be eating some string IDs when saving a file


From   "David Radwin" <dradwin@mprinc.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: Stata appears to be eating some string IDs when saving a file
Date   Tue, 2 Apr 2013 17:06:41 -0700 (PDT)

Just out of curiosity, approximately how large is the file? Gigabytes? 
Hundreds of gigabytes? (I realize that even a small file could be larger 
than a small server, but that seems unlikely these days.)

I'm glad you identified the problem, and thank you for reporting back to the 
list for posterity.

David
--
David Radwin
Senior Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794

www.mprinc.com


> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
> statalist@hsphsun2.harvard.edu] On Behalf Of Dimitriy V. Masterov
> Sent: Tuesday, April 02, 2013 4:35 PM
> To: Statalist
> Subject: st: Re: Stata appears to be eating some string IDs when saving a
> file
>
> STS has confirmed that I am not a crazy person, at least not in this
> instance. This is a real bug.
>
> The problem is that Stata does not return an error when the file
> system fills up. The developers are now aware of this and they would
> like to have Stata detect this problem in the future and report the
> error correctly. They also plan to add some more error checking to the
> -use- command so that it catches files that have been corrupted.
>
> For now, the best way to detect these types of issue is to use the
> -datasignature- command to verify that the data set was not
> modified/corrupted when saved.
>
> DVM
>
> On Sun, Mar 31, 2013 at 10:32 PM, Dimitriy V. Masterov
> <dvmaster@gmail.com> wrote:
> > I believe I diagnosed the issue. This seems to happen when I am
> > running low on space in my home directory on the server. When I freed
> > up some space, the problem went away. I wish there was some sort of
> > warning to alert users that this is happening. This has been a very
> > frustrating and terrifying experience.
> >
> > DVM
> >
> > On Sat, Mar 30, 2013 at 2:25 PM, Dimitriy V. Masterov
> > <dvmaster@gmail.com> wrote:
> >> I am having a strange problem with Stata deleting the values for about
> 80%
> >> of my data when I save a file. It only does it for string variables,
> >> and this only happens some of the time that I run this code.
> >>
> >> Here's the relevant part:
> >>
> >> . des ;
> >>
> >> Contains data
> >>   obs:    10,766,127
> >>  vars:             4
> >>  size:   387,580,572
> >> ------------------------------
> >> -----------------------------------------------------------------------
> --------------------------------------------------------------------------
> -----------------------------------------------
> >>               storage  display     value
> >> variable name   type   format      label      variable label
> >> -----------------------------------------------------------------------
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> ---
> >> slr_id          str10  %10s
> >> byr_id          str10  %10s
> >> item_id         str12  %12s
> >> pt_m2m_cat      float  %21.0g      pt_m2m_cat
> >> -----------------------------------------------------------------------
> --------------------------------------------------------------------------
> --------------------------------------------------------------------------
> ---
> >> Sorted by:
> >>      Note:  dataset has changed since last saved
> >>
> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) &
> >> !missing(pt_m2m_cat);
> >>
> >> . count;
> >> 10766127
> >>
> >> . save "pt_m2m_cat.dta", replace;
> >> file pt_m2m_cat.dta saved
> >>
> >> . use "pt_m2m_cat.dta", clear;
> >>
> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) &
> >> !missing(pt_m2m_cat);
> >> 3407873 contradictions in 10766127 observations
> >> assertion is false
> >> r(9);
> >>
> >>
> >> My Stata MP is 12.1 (March 20, 2013), on an Ubuntu box. Any ideas how
> >> to diagnose this?
> >>
> >> DVM
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index