Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Sergiy Radyakin <serjradyakin@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Routine from do-file that every time it's run gives a different result |
Date | Wed, 6 Nov 2013 17:10:28 -0500 |
On Wed, Nov 6, 2013 at 4:50 PM, Nick Cox <njcoxstata@gmail.com> wrote: > But that solves the problem with the price of not understanding it. Nick, I agree, but Clarice was interested in why the results change at random. The -sort- command has a consequence of element of randomness, which literally bites everyone at least once (e.g. it bites myself at a rate about once per year:). In fact I would call it -randomsort- or -sort, random- to make it explicit, and -sort- would be the name for the current -sort, stable-. Just like summarize by default calculates sd and produces some output, and -summarize, meanonly- is a faster restricted version for programmers. I didn't even read the whole code and stopped once I saw -sort- without stable option. > Somewhere Clarice has hidden assumptions about the -sort- order being > enough to get the right order without extra information that are not > correct. I believe Clarice would immediately validate the results, and recognize the errors of this kind as soon as the results stabilize. So far there is no point in running validations since the results are going to change all the time. Clarice may also be interested in reading about the -duplicates- command. As it seems she is simulating it's behavior. The rest of the code can be optimized for readability. There are a lot of "magic" transformations of Pi's above, which are incomprehensible to me now, and (probably) to Clarice in a couple of months after the moment of writing. But some obvious optimizations I advise to take are of kind: gen temp = P1 + P5 drop if temp >= . drop temp is simply: drop if missing(P1+P5) Best, Sergiy > Nick > njcoxstata@gmail.com > > > On 6 November 2013 21:46, Sergiy Radyakin <serjradyakin@gmail.com> wrote: >> Clarice, add the option stable to the sort commands. Without this >> option, the -sort- command will break the ties randomly. See here: >> http://www.stata.com/help.cgi?sort >> >> Best, Sergiy >> >> On Wed, Nov 6, 2013 at 4:30 PM, Clarice Martins >> <martins.clarice@gmail.com> wrote: >>> Dear Statalist group, >>> >>> I have a routine that apparently was running ok, and then I noticed that everytime I execute the code I get different results for one of the variables. >>> (The routine is long, so I don't know how to best provide you guys with enough info.) >>> >>> 1) I believe the problem has to do with variable -P5- since this is the variable which average changes every time I run the code. >>> >>> 2) Sample of the results, I am getting: as you can see variable P1 is always approximately the same (it should be the same) and variable Strategy is ALWAYS the same, but var -P5- changes by a lot. (I've shown two outputs, but I've ran it several, several times.) >>> >>> >>> . esttab . >>> >>> ---------------------------- >>> (1) >>> Mean >>> ---------------------------- >>> P1 0.300*** >>> (3.41) >>> >>> P5 6.154 >>> (1.53) >>> >>> strategy 7.190 >>> (1.78) >>> ---------------------------- >>> N 150 >>> ---------------------------- >>> >>> >>> ---------------------------- >>> (1) >>> Mean >>> ---------------------------- >>> P1 0.223* >>> (2.24) >>> >>> P5 3.286 >>> (1.15) >>> >>> strategy 7.190 >>> (1.78) >>> ---------------------------- >>> N 150 >>> ---------------------------- >>> >>> 3) Piece of the code that deals with creating and changing variable P5: (my apologies if this is confusing or too long) >>> >>> ***create variable P1/P5 and sum all 1st/5th quintiles per <yrmonth> >>> gen P1_sell = . >>> quietly levelsof yrmonth, local(levs) >>> quietly foreach lev of local levs { >>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==1 >>> replace P1_sell=work if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==1 >>> drop work >>> } >>> >>> gen P5_buy = . >>> quietly levelsof yrmonth, local(levs) >>> quietly foreach lev of local levs { >>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==5 >>> replace P5_buy=work if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==5 >>> drop work >>> } >>> >>> sort quintile yrmonth rtype >>> >>> **undo the buy/sell operation >>> *in order to do the procedure, first copy quintile #s to same <co_id> but for 6 <yrmonth> LATER >>> >>> bysort co_id period: egen tocopy2 = total(quintile / (rtype == "buy_sell_period")) >>> bysort co_id rtype (negperiod) : replace quintile = tocopy2[_n+6] if missing(quintile) & rtype == "hold_period" >>> sort quintile yrmonth rtype >>> >>> *add sums of 1st/5th quintiles for <hold_period> to variables P1/P5 >>> >>> quietly levelsof yrmonth, local(levs) >>> quietly foreach lev of local levs { >>> egen work=total(return) if rtype=="hold_period" & yrmonth == "`lev'" & quintile==5 >>> replace P1_sell=work if rtype=="hold_period" & yrmonth == "`lev'" & quintile==5 >>> drop work >>> } >>> >>> quietly levelsof yrmonth, local(levs) >>> quietly foreach lev of local levs { >>> egen work=total(return) if rtype=="hold_period" & yrmonth == "`lev'" & quintile==1 >>> replace P5_buy=work if rtype=="hold_period" & yrmonth == "`lev'" & quintile==1 >>> drop work >>> } >>> sort quintile yrmonth rtype >>> >>> >>> ***------procedures for Strategy analysis >>> **preparing time-series >>> *P1 is the variable to use for the time-series / keep -P1_sell- intact just for the sake of it >>> >>> gen P1 = P1_sell >>> gen copyP1=P1 >>> replace P1 = . if P1 == copyP1[_n-1] >>> drop copyP1 >>> >>> *P5 is the variable to use for the time-series / keep -P5_buy- intact just for the sake of it >>> >>> gen P5 = P5_buy >>> gen copyP5=P5 >>> replace P5 = . if P5 == copyP5[_n-1] >>> drop copyP5 >>> >>> *keeping only time-series variables & unique records >>> keep P1 P5 period >>> >>> sort period P1 P5 >>> quietly by period P1 P5: gen dup = cond(_N==1,0,_n) >>> drop if dup>0 >>> drop dup >>> >>> sort period P1 P5 >>> gen P5copy = P5 >>> replace P5 = P5copy[_n+1] if P5 >= . >>> replace P5 = P5copy[_n+3] if P5 >= . >>> drop P5copy >>> >>> sort period >>> quietly by period: gen dup = cond(_N==1,0,_n) >>> drop if dup>2 >>> drop dup >>> >>> gen temp = P1 + P5 >>> drop if temp >= . >>> drop temp >>> >>> by period: egen strategy=total(P1 + P5) >>> >>> sort strategy >>> quietly by strategy: gen dup = cond(_N==1,0,_n) >>> drop if dup>1 >>> drop dup >>> >>> sort period >>> >>> ** changing into a time-series // not sure if it is necessary yet... >>> tsset period >>> mean P1 P5 strategy >>> ******end of code >>> >>> Thanks for your consideration! Any comment or suggestions will be appreciated. >>> Clarice >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/