[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Data Corruption?

From   Jeph Herrin <>
Subject   Re: st: Data Corruption?
Date   Wed, 24 Oct 2007 18:05:17 -0400

FWIW, I have sometimes (once or twice) found this kind of
problem in datasets that were created by StatTranfer.

I solved it by having StatTransfer create text files
and then -infile-ing them.


William Gould, StataCorp LP wrote:
Ed Blackburne <> reports

[...] I have strange results when calculating (basic) stats for a variable.
I assume there is some sort of data corruption, but I have never seen this
before, so any pointers would be helpful.

Here is a listing of my data (I have added an if condition to keep the example simple).

. li oil_level gdp if id==211

| oil_le~l gdp |
1059. | 548.9 15492.840168784467 |
1060. | 575.7 16248.206511636326 |
1061. | 595.8 16370.615114025704 |
1062. | 635.5 17072.56241927471 |
1063. | 667.8 17501.07679210328 |
1064. | 694.6 17321.478178718633 |
<output omitted by me, not Ed >
1098. | 948.7 36098.15411932452 |

and yet, Ed reports:

      . summ oil_level gdp if id==211

          Variable |   Obs        Mean    Std. Dev.      Min        Max
         oil_level |    40      781.27    93.45438     548.9      948.7
               gdp |    40     3019.45    771.8664      1673       4237
Svend Juul tried to reproduce the problem, couldn't, and said "Looks like human error to me. Are you sure nothing happened to your data between
the -list- and the -summarize- command?"

Of course, my response is the same, but I also assume Ed is reasonably sure
that he typed -list- followed by -summarize-.
I suggest Ed go to our Technical Services by emailing
Don't forget to include the serial number of the Stata in your email.

Right now, I'm at a loss to explain the problem, although I'm thinking (in no particular order) broken/corrupted hardware, corruputed Stata, or corrupted dataset. Given what little I know right now, none of the above exactly fits what Ed is reporting.

I have one experiment I want Ed to perform so he can report results to Technical Services.

Do the following:

. log using problem.log, replace

. <use dataset>

. list oil_level gdp if id==211

. summarize oil_level gdp if id==211

. list oil_level gdp if id==211

. log close

Do them exactly like that, in that order, with nothing in between.
My questions are (1) is -summarize- still mistaken and (2) does the second listing match the first?

Obviously, if the problem vanishes, we are back at human error. Otherwise,
we will want the log and the dataset.

Ed also reports

 Stata versions: 9.2, both Windows and Linux experience the same problem.
Does that mean he ran the experiment on two *DIFFERENT* computers, or one computer booted different ways?

For Ed's information, we have seen corrupted Statas. They tend to crash. We have seen broken computers. They tend to crash, too, although we have
seen one with a broken Floating Point Coprocessor that simply give the wrong
answers for the exp() function. We have seen computers with bad memory. The
data morphs, but importantly, it continues to morph as you use the computer.
We have seen corrupted datasets, but they are simply corrupted and all Stata's
routines agree as to the (corrupted) contents of the data.

Go to Technical Services. I suspect we at StataCorp and Ed are going to have
to going to need to exchange emails to figure out what the problem is.

-- Bill
* For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index