[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Data Corruption?

From (William Gould, StataCorp LP)
Subject   Re: st: Data Corruption?
Date   Wed, 24 Oct 2007 12:02:39 -0500

Ed Blackburne <> reports 

> [...] I have strange results when calculating (basic) stats for a variable.
> I assume there is some sort of data corruption, but I have never seen this
> before, so any pointers would be helpful.
> Here is a listing of my data (I have added an if condition to keep the 
> example simple).
>       . li oil_level gdp if id==211
>             +-------------------------------+
>             | oil_le~l                  gdp |
>             |-------------------------------|
>       1059. |    548.9   15492.840168784467 |
>       1060. |    575.7   16248.206511636326 |
>       1061. |    595.8   16370.615114025704 |
>       1062. |    635.5    17072.56241927471 |
>       1063. |    667.8    17501.07679210328 |
>             |-------------------------------|
>       1064. |    694.6   17321.478178718633 |
>             <output omitted by me, not Ed   >
>       1098. |    948.7    36098.15411932452 |
>             +-------------------------------+

and yet, Ed reports:

>       . summ oil_level gdp if id==211
>           Variable |   Obs        Mean    Std. Dev.      Min        Max
>       -------------+---------------------------------------------------
>          oil_level |    40      781.27    93.45438     548.9      948.7
>                gdp |    40     3019.45    771.8664      1673       4237

Svend Juul tried to reproduce the problem, couldn't, and said "Looks like 
human error to me. Are you sure nothing happened to your data between
the -list- and the -summarize- command?"

Of course, my response is the same, but I also assume Ed is reasonably sure
that he typed -list- followed by -summarize-.  

I suggest Ed go to our Technical Services by emailing
Don't forget to include the serial number of the Stata in your email.

Right now, I'm at a loss to explain the problem, although I'm thinking 
(in no particular order) broken/corrupted hardware, corruputed Stata, 
or corrupted dataset.  Given what little I know right now, none of the 
above exactly fits what Ed is reporting.

I have one experiment I want Ed to perform so he can report results 
to Technical Services.

Do the following:

        . log using problem.log, replace

        . <use dataset>

        . list oil_level gdp if id==211

        . summarize oil_level gdp if id==211

        . list oil_level gdp if id==211

        . log close

Do them exactly like that, in that order, with nothing in between.
My questions are (1) is -summarize- still mistaken and (2) does the 
second listing match the first?

Obviously, if the problem vanishes, we are back at human error.  Otherwise,
we will want the log and the dataset.

Ed also reports

>  Stata versions: 9.2, both Windows and Linux experience the same problem.

Does that mean he ran the experiment on two *DIFFERENT* computers, or 
one computer booted different ways?

For Ed's information, we have seen corrupted Statas.  They tend to crash.  
We have seen broken computers.  They tend to crash, too, although we have
seen one with a broken Floating Point Coprocessor that simply give the wrong
answers for the exp() function.  We have seen computers with bad memory.  The
data morphs, but importantly, it continues to morph as you use the computer.
We have seen corrupted datasets, but they are simply corrupted and all Stata's
routines agree as to the (corrupted) contents of the data.

Go to Technical Services.  I suspect we at StataCorp and Ed are going to have
to going to need to exchange emails to figure out what the problem is.

-- Bill
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index