Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Dimitriy V. Masterov" <dvmaster@gmail.com> |
To | Statalist <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: RE: Re: Stata appears to be eating some string IDs when saving a file |
Date | Tue, 2 Apr 2013 17:16:19 -0700 |
David, The original file was only 1 or 1.5G. The crazy thing was I wasn't getting ANY errors in either Stata or Ubuntu when I was saving it. I noticed the cause when I deleted some other stuff I was working on and everything started working all of a sudden. DVM On Tue, Apr 2, 2013 at 5:06 PM, David Radwin <dradwin@mprinc.com> wrote: > Just out of curiosity, approximately how large is the file? Gigabytes? > Hundreds of gigabytes? (I realize that even a small file could be larger > than a small server, but that seems unlikely these days.) > > I'm glad you identified the problem, and thank you for reporting back to the > list for posterity. > > David > -- > David Radwin > Senior Research Associate > MPR Associates, Inc. > 2150 Shattuck Ave., Suite 800 > Berkeley, CA 94704 > Phone: 510-849-4942 > Fax: 510-849-0794 > > www.mprinc.com > > >> -----Original Message----- >> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- >> statalist@hsphsun2.harvard.edu] On Behalf Of Dimitriy V. Masterov >> Sent: Tuesday, April 02, 2013 4:35 PM >> To: Statalist >> Subject: st: Re: Stata appears to be eating some string IDs when saving a >> file >> >> STS has confirmed that I am not a crazy person, at least not in this >> instance. This is a real bug. >> >> The problem is that Stata does not return an error when the file >> system fills up. The developers are now aware of this and they would >> like to have Stata detect this problem in the future and report the >> error correctly. They also plan to add some more error checking to the >> -use- command so that it catches files that have been corrupted. >> >> For now, the best way to detect these types of issue is to use the >> -datasignature- command to verify that the data set was not >> modified/corrupted when saved. >> >> DVM >> >> On Sun, Mar 31, 2013 at 10:32 PM, Dimitriy V. Masterov >> <dvmaster@gmail.com> wrote: >> > I believe I diagnosed the issue. This seems to happen when I am >> > running low on space in my home directory on the server. When I freed >> > up some space, the problem went away. I wish there was some sort of >> > warning to alert users that this is happening. This has been a very >> > frustrating and terrifying experience. >> > >> > DVM >> > >> > On Sat, Mar 30, 2013 at 2:25 PM, Dimitriy V. Masterov >> > <dvmaster@gmail.com> wrote: >> >> I am having a strange problem with Stata deleting the values for about >> 80% >> >> of my data when I save a file. It only does it for string variables, >> >> and this only happens some of the time that I run this code. >> >> >> >> Here's the relevant part: >> >> >> >> . des ; >> >> >> >> Contains data >> >> obs: 10,766,127 >> >> vars: 4 >> >> size: 387,580,572 >> >> ------------------------------ >> >> ----------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> ----------------------------------------------- >> >> storage display value >> >> variable name type format label variable label >> >> ----------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> --- >> >> slr_id str10 %10s >> >> byr_id str10 %10s >> >> item_id str12 %12s >> >> pt_m2m_cat float %21.0g pt_m2m_cat >> >> ----------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> --- >> >> Sorted by: >> >> Note: dataset has changed since last saved >> >> >> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) & >> >> !missing(pt_m2m_cat); >> >> >> >> . count; >> >> 10766127 >> >> >> >> . save "pt_m2m_cat.dta", replace; >> >> file pt_m2m_cat.dta saved >> >> >> >> . use "pt_m2m_cat.dta", clear; >> >> >> >> . assert !missing(slr_id) & !missing(byr_id) & !missing(item_id) & >> >> !missing(pt_m2m_cat); >> >> 3407873 contradictions in 10766127 observations >> >> assertion is false >> >> r(9); >> >> >> >> >> >> My Stata MP is 12.1 (March 20, 2013), on an Ubuntu box. Any ideas how >> >> to diagnose this? >> >> >> >> DVM >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/