Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Re: Stata appears to be eating some string IDs when saving a file


From   "Dimitriy V. Masterov" <dvmaster@gmail.com>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   Re: st: RE: Re: Stata appears to be eating some string IDs when saving a file
Date   Tue, 2 Apr 2013 17:16:19 -0700

David,

The original file was only 1 or 1.5G. The crazy thing was I wasn't
getting ANY errors in either Stata or Ubuntu when I was saving it.

I noticed the cause when I deleted some other stuff I was working on
and everything started working all of a sudden.

DVM

On Tue, Apr 2, 2013 at 5:06 PM, David Radwin <dradwin@mprinc.com> wrote:
> Just out of curiosity, approximately how large is the file? Gigabytes?
> Hundreds of gigabytes? (I realize that even a small file could be larger
> than a small server, but that seems unlikely these days.)
>
> I'm glad you identified the problem, and thank you for reporting back to the
> list for posterity.
>
> David
> --
> David Radwin
> Senior Research Associate
> MPR Associates, Inc.
> 2150 Shattuck Ave., Suite 800
> Berkeley, CA 94704
> Phone: 510-849-4942
> Fax: 510-849-0794
>
> www.mprinc.com
>
>
>> -----Original Message-----
>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-
>> statalist@hsphsun2.harvard.edu] On Behalf Of Dimitriy V. Masterov
>> Sent: Tuesday, April 02, 2013 4:35 PM
>> To: Statalist
>> Subject: st: Re: Stata appears to be eating some string IDs when saving a
>> file
>>
>> STS has confirmed that I am not a crazy person, at least not in this
>> instance. This is a real bug.
>>
>> The problem is that Stata does not return an error when the file
>> system fills up. The developers are now aware of this and they would
>> like to have Stata detect this problem in the future and report the
>> error correctly. They also plan to add some more error checking to the
>> -use- command so that it catches files that have been corrupted.
>>
>> For now, the best way to detect these types of issue is to use the
>> -datasignature- command to verify that the data set was not
>> modified/corrupted when saved.
>>
>> DVM
>>
>> On Sun, Mar 31, 2013 at 10:32 PM, Dimitriy V. Masterov
>> <dvmaster@gmail.com> wrote:
>> > I believe I diagnosed the issue. This seems to happen when I am
>> > running low on space in my home directory on the server. When I freed
>> > up some space, the problem went away. I wish there was some sort of
>> > warning to alert users that this is happening. This has been a very
>> > frustrating and terrifying experience.
>> >
>> > DVM
>> >
>> > On Sat, Mar 30, 2013 at 2:25 PM, Dimitriy V. Masterov
>> > <dvmaster@gmail.com> wrote:
>> >> I am having a strange problem with Stata deleting the values for about
>> 80%
>> >> of my data when I save a file. It only does it for string variables,
>> >> and this only happens some of the time that I run this code.
>> >>
>> >> Here's the relevant part:
>> >>
>> >> . des ;
>> >>
>> >> Contains data
>> >>   obs:    10,766,127
>> >>  vars:             4
>> >>  size:   387,580,572
>> >> ------------------------------
>> >> -----------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> -----------------------------------------------
>> >>               storage  display     value
>> >> variable name   type   format      label      variable label
>> >> -----------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> ---
>> >> slr_id          str10  %10s
>> >> byr_id          str10  %10s
>> >> item_id         str12  %12s
>> >> pt_m2m_cat      float  %21.0g      pt_m2m_cat
>> >> -----------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> --------------------------------------------------------------------------
>> ---
>> >> Sorted by:
>> >>      Note:  dataset has changed since last saved
>> >>
>> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) &
>> >> !missing(pt_m2m_cat);
>> >>
>> >> . count;
>> >> 10766127
>> >>
>> >> . save "pt_m2m_cat.dta", replace;
>> >> file pt_m2m_cat.dta saved
>> >>
>> >> . use "pt_m2m_cat.dta", clear;
>> >>
>> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) &
>> >> !missing(pt_m2m_cat);
>> >> 3407873 contradictions in 10766127 observations
>> >> assertion is false
>> >> r(9);
>> >>
>> >>
>> >> My Stata MP is 12.1 (March 20, 2013), on an Ubuntu box. Any ideas how
>> >> to diagnose this?
>> >>
>> >> DVM
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index