Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: st: Same code, same machine, same data, different results

From	Christopher Baum <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: Re: st: Same code, same machine, same data, different results
Date	Thu, 6 Sep 2012 14:09:52 +0000

<>
On Sep 6, 2012, at 2:33 AM, Dmitriy wrote:

> 
> Do you have any m:m merges by any chance?
> 
> DVM
> 
> On Wed, Sep 5, 2012 at 2:10 PM, Mattia Landoni <[email protected]> wrote:
>> Dear statalisters,
>> 
>> a friend of mine has a bizarre problem. She is running a regression as follows:
>> 
>> xi: regress a b c i.d i.e
>> 
>> and her output is different every time. Has anyone ever seen a
>> behavior like this? Below are some details.
>> 
>> Environment:
>> - Stata 11
>> - Windows 32-bit
>> 
>> Precise description:
>> The do-file imports several files from .csv, then merges them, then
>> runs the regression. If I run the do-file, I get certain results. If I
>> issue the same regression command again, I get again the same results,
>> as it should be. However, if I re-run the do-file from the beginning,
>> I get slightly different results and the regression even reports a
>> slightly different number of observations. (Say, 2663 vs. 2666). Every
>> time all the data are taken afresh from the same static .csv sources.
>> There is nothing random about the do-file, that I know. The xi:
>> command generates about 200 i-variables and a few, maybe 10, are
>> dropped because of collinearity. There are more than 2500
>> observations.

This is EXACTLY what happens when you do a m:m merge. (See IMEUS (Baum,2006), 3.7.2 for why you really shouldn't even try). 
I once spent 2 hours with one of my (very bright) grad students who was having this kind of problem in his do-file, with the old merge command,
and we tracked it down to a non-unique merge key, in essence what is now called a m:m merge.

I have had an exchange recently with a user on the LinkedIn Stata forum about this issue; he wanted to know whether
Stata had 'fixed' the merge command in Stata 12 so that it did m:m merges correctly. I argued that there was no clear
definition, in database terms, of what you are doing with a m:m merge, so no 'fix' would be forthcoming. He said he relied
on SAS to do it, with PROC SQL, which perhaps has some hardwired rules about how to handle the innate indeterminacy of such
an operation.

KIt

Kit Baum   |   Boston College Economics & DIW Berlin   |   http://ideas.repec.org/e/pba1.html
                             An Introduction to Stata Programming  |   http://www.stata-press.com/books/isp.html
  An Introduction to Modern Econometrics Using Stata  |   http://www.stata-press.com/books/imeus.html

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: e(rmse) for xtreg, re ?
Next by Date: Re: st: loops for regions
Previous by thread: Re: st: Re: Same code, same machine, same data, different results
Next by thread: st: Propensity Score Matching (PSM) - matching problem
Index(es):
- Date
- Thread