Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: st: Same code, same machine, same data, different results


From   Christopher Baum <kit.baum@bc.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: Re: st: Same code, same machine, same data, different results
Date   Thu, 6 Sep 2012 14:09:52 +0000

<>
On Sep 6, 2012, at 2:33 AM, Dmitriy wrote:

> 
> Do you have any m:m merges by any chance?
> 
> DVM
> 
> On Wed, Sep 5, 2012 at 2:10 PM, Mattia Landoni <mattia.landoni@gmail.com> wrote:
>> Dear statalisters,
>> 
>> a friend of mine has a bizarre problem. She is running a regression as follows:
>> 
>> xi: regress a b c i.d i.e
>> 
>> and her output is different every time. Has anyone ever seen a
>> behavior like this? Below are some details.
>> 
>> Environment:
>> - Stata 11
>> - Windows 32-bit
>> 
>> Precise description:
>> The do-file imports several files from .csv, then merges them, then
>> runs the regression. If I run the do-file, I get certain results. If I
>> issue the same regression command again, I get again the same results,
>> as it should be. However, if I re-run the do-file from the beginning,
>> I get slightly different results and the regression even reports a
>> slightly different number of observations. (Say, 2663 vs. 2666). Every
>> time all the data are taken afresh from the same static .csv sources.
>> There is nothing random about the do-file, that I know. The xi:
>> command generates about 200 i-variables and a few, maybe 10, are
>> dropped because of collinearity. There are more than 2500
>> observations.

This is EXACTLY what happens when you do a m:m merge. (See IMEUS (Baum,2006), 3.7.2 for why you really shouldn't even try). 
I once spent 2 hours with one of my (very bright) grad students who was having this kind of problem in his do-file, with the old merge command,
and we tracked it down to a non-unique merge key, in essence what is now called a m:m merge.

I have had an exchange recently with a user on the LinkedIn Stata forum about this issue; he wanted to know whether
Stata had 'fixed' the merge command in Stata 12 so that it did m:m merges correctly. I argued that there was no clear
definition, in database terms, of what you are doing with a m:m merge, so no 'fix' would be forthcoming. He said he relied
on SAS to do it, with PROC SQL, which perhaps has some hardwired rules about how to handle the innate indeterminacy of such
an operation.

KIt

Kit Baum   |   Boston College Economics & DIW Berlin   |   http://ideas.repec.org/e/pba1.html
                             An Introduction to Stata Programming  |   http://www.stata-press.com/books/isp.html
  An Introduction to Modern Econometrics Using Stata  |   http://www.stata-press.com/books/imeus.html


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index