Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Stata crashes when loading a dataset


From   Alan Riley <ariley@stata.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Stata crashes when loading a dataset
Date   Wed, 25 May 2011 14:50:24 -0500

Dan Blanchette experienced a crash when he tried to use a dataset
he obtained from the Internet:

> I fell upon an odd situation where Stata 11 crashed when I tried to load
> a dataset that I downloaded from the internet (from a site in a foreign
> country) when I used the -use- command like so:
> 
>  . use "C:\data\foreign_data.dta"
> 
> The person supplying the dataset reported that the dataset loaded fine
> for him on his computer.  In the process of trying to figure out a way
> to get Stata to load the dataset without crashing, I stumbled on an odd 
> solution.  All I had to do was specify a varlist like so:
> 
>  . use * using "C:\data\foreign_data.dta"
> 
> and Stata loaded the whole dataset just fine.  I discovered that the
> dataset contained almost all numeric variables.  The one string variable
> had no foreign characters.  The dataset nor variables had any notes.  Two
> of the numeric variables had two value labels that had 1 foreign character
> in them.  I believe that is what caused Stata to crash when not specifying
> a variable list.
> 
> Would you not expect these two commands to be identical?
> 
>  . use "C:\data\foreign_data.dta"
>  . use * using "C:\data\foreign_data.dta"


Dan surmised that foreign characters in some of the value labels could
have caused the crash.  I do not believe this is the case.  Stata has
no problem with extended ASCII characters in string variables, variable
labels, or value labels.

I believe that the dataset Dan obtained is somehow corrupt, and
this is what is causing the crash.  When a dataset is corrupt, it
can cause part of Stata's memory to have a 'hole' poked in it, and
that hole can lead to a crash.

It is merely fortuitous that Stata did not crash when Dan tried
-use * using ...-.  While to a human, -use- with a varlist which
happens to be the entire varlist looks the same as -use- without
a varlist, to Stata, these take two different paths through the
code.  In the case of -use- with a varlist, even when that varlist
contains every variable, Stata retrieves the data for each
observation variable-by-variable rather than the entire observation
at once.  With a corrupt dataset, a hole could still get poked
in memory in this code and it is merely fortuitous in Dan's
case that Stata did not also crash here.

The corruption could have come from the download process or
perhaps the .dta file Dan downloaded was exported by another
package with something out-of-spec about it, such as a variable
or value label with more characters in it than Stata allows.


--Alan
ariley@stata.com

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index