Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Sarah Edgington" <sedging@ucla.edu> |
To | <statalist@hsphsun2.harvard.edu> |
Subject | st: RE: Macro Producing Different Results Each Time Executed |
Date | Thu, 22 Aug 2013 13:55:39 -0700 |
You're right, this very likely a sorting problem. If PRVDR_NUM is not a unique identifier then it is entirely possible for the sort to be different each time. You can use the stable option on sort. The problem with that is that if the sort order for the original files changes you may still get inconsistent results. I think the best strategy with issues like this is to figure out why PRVDR_NUM isn't a unique identifier and then identify exactly what rule you want to use to choose which record you want to keep for each PRVDR_NUM. One way to assure that you always get the same results is to always sort by a combination of variables that you know uniquely observations. Then you will always identify the same first observation within a provider. However, while this should get you the SAME results every time you run the code, it will not necessarily get you the RIGHT results. If your results differ each time you run this, that suggests that what you keep and what your drop actually matters for your results. So while PRVDR_NUM may be duplicated across multiple observations, it sounds like the other variables you're interested in actually vary across those observations. You need to figure out exactly which observations you want to retain and make sure your code retains those observations to be sure that your results are both consistent and correct. -Sarah -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of William Sankey Sent: Thursday, August 22, 2013 1:22 PM To: statalist@hsphsun2.harvard.edu Subject: st: Macro Producing Different Results Each Time Executed Dear Statalist, The following 'foreach' statement produces different results each time it executes. My suspicion is that the sort does not necessarily happen in the same order each time the program runs, hence what is dropped and what is kept in the merge becomes different each time. Can the sort execute differently each time it is run, do you have other thoughts on why I might be obtaining different results each time this function is executed? Thanks in advance, foreach file in col1 col2 col4 col5 { use `file' , clear sort PRVDR_NUM by PRVDR_NUM: gen keeper = 1 if _n==1 keep if keeper==1 drop keeper sort PRVDR_NUM save myusing3, replace use mycostreports, clear by PRVDR_NUM: generate unique=1 if _N==1 | (_N==2 & psych_type=="Hosp") keep if unique==1 drop unique sort PRVDR_NUM merge PRVDR_NUM using myusing3 tab _merge drop if _merge==1 drop _merge *Tables: egen paytotal = total(ProviderPayments) if REG_G==9 egen paytotal_b = total(ProviderPay_base) if REG_G==9 gen change53 = (paytotal-paytotal_b)/paytotal_b if REG_G==9 drop paytotal* *Executed for other changes* sum change1 - change53 } -- William J. Sankey Johns Hopkins University MA Public Policy '12 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/