Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Routine from do-file that every time it's run gives a different result


From   Clarice Martins <[email protected]>
To   [email protected]
Subject   Re: st: Routine from do-file that every time it's run gives a different result
Date   Thu, 7 Nov 2013 14:47:06 -0200

Thanks to all for the valuable input...

Sarah, thanks for the practical tips on how to troubleshoot, I am definitely very new at any kind of programming and needed this kind of advice.

Nick, I agree that it is important to verify where the error is, the -stable- option might aid me, but I will definitely search forward to figure out where are my hidden assumptions.

Just another question on this issue:
Another portion of the code uses -xtile- to break the portfolio of returns in quintiles. (at first, I didn't think it was important.)

But... Could there be a problem also with how -xtile- break the dataset in groups?  I mean, even when I did this manually in Excel, it was always difficult to decide how many observations stay in each quintile group. (e.g.: if the dataset has 21 observations, we will have 4 groups of 4 and one group of 5, which group takes the extra observation?)

Thanks again!!!
Clarice


On Nov 6, 2013, at 8:11 PM, Sarah Edgington wrote:

> Clarice,
> Nick's right that you need to do more digging.  However, I would argue that
> the solution of using the stable option to -sort- is worse than "[solving]
> the problem with the price of not understanding it."  Using -sort, stable-
> is actually just pretending that there is not a problem at all.  Yes, that
> strategy will get you consistent results, but the chances that they'll be
> the right results are pretty slim.  Being able to reproduce the wrong answer
> is generally just as bad as not being able to reproduce the answer at all.
> 
> To be a bit more explicit, what sort order you end up with clearly matters
> for your results.  You need to figure out why the variables you're sorting
> on are not producing unique results and figuring out how to fix that that.
> Using -sort, stable- may very well appear to fix your problem but presumably
> you care whether the average of P5 is 6.154 or 3.286.  If you don't do more
> investigation you'll never know which of those is the number you're really
> looking for (or whether it's something else completely).
> 
> One thing that I find useful when troubleshooting this kind of problem is to
> use -sum- after every section where I create new variables with values where
> sort order matters.  Then I'll run the dofile multiple times, saving a
> logfile with a different name each time.  Usually you can pretty quickly
> spot where things went wrong by comparing the log files from two different
> runs, as long as you put in descriptive of your created variables along the
> way.
> 
> Another useful command when trying to identify whether you're uniquely
> sorting observations is -isid-.  Any combination of variables that don't
> function as a unique ID will leave you with ties on the sort, leading to the
> kind of unpredictable results you see here.
> 
> -Sarah
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Wednesday, November 06, 2013 1:51 PM
> To: [email protected]
> Subject: Re: st: Routine from do-file that every time it's run gives a
> different result
> 
> But that solves the problem with the price of not understanding it.
> Somewhere Clarice has hidden assumptions about the -sort- order being enough
> to get the right order without extra information that are not correct.
> Nick
> [email protected]
> 
> 
> On 6 November 2013 21:46, Sergiy Radyakin <[email protected]> wrote:
>> Clarice, add the option stable to the sort commands. Without this 
>> option, the -sort- command will break the ties randomly. See here:
>> http://www.stata.com/help.cgi?sort
>> 
>> Best, Sergiy
>> 
>> On Wed, Nov 6, 2013 at 4:30 PM, Clarice Martins 
>> <[email protected]> wrote:
>>> Dear Statalist group,
>>> 
>>> I have a routine that apparently was running ok, and then I noticed that
> everytime I execute the code I get different results for one of the
> variables.
>>> (The routine is long, so I don't know how to best provide you guys 
>>> with enough info.)
>>> 
>>> 1) I believe the problem has to do with variable -P5- since this is the
> variable which average changes every time I run the code.
>>> 
>>> 2) Sample of the results, I am getting:  as you can see variable P1 
>>> is always approximately the same (it should be the same) and variable 
>>> Strategy is ALWAYS the same, but var -P5- changes by a lot. (I've 
>>> shown two outputs, but I've ran it several, several times.)
>>> 
>>> 
>>> . esttab .
>>> 
>>> ----------------------------
>>>                      (1)
>>>                     Mean
>>> ----------------------------
>>> P1                  0.300***
>>>                   (3.41)
>>> 
>>> P5                  6.154
>>>                   (1.53)
>>> 
>>> strategy            7.190
>>>                   (1.78)
>>> ----------------------------
>>> N                     150
>>> ----------------------------
>>> 
>>> 
>>> ----------------------------
>>>                      (1)
>>>                     Mean
>>> ----------------------------
>>> P1                  0.223*
>>>                   (2.24)
>>> 
>>> P5                  3.286
>>>                   (1.15)
>>> 
>>> strategy            7.190
>>>                   (1.78)
>>> ----------------------------
>>> N                     150
>>> ----------------------------
>>> 
>>> 3) Piece of the code that deals with creating and changing variable 
>>> P5: (my apologies if this is confusing or too long)
>>> 
>>> ***create variable P1/P5 and sum all 1st/5th quintiles per <yrmonth> 
>>> gen P1_sell = .
>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local 
>>> levs {
>>>        egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
> "`lev'" & quintile==1
>>>        replace P1_sell=work if rtype=="buy_sell_period" & yrmonth ==
> "`lev'" & quintile==1
>>>        drop work
>>> }
>>> 
>>> gen P5_buy = .
>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local 
>>> levs {
>>>        egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
> "`lev'" & quintile==5
>>>        replace P5_buy=work if rtype=="buy_sell_period" & yrmonth ==
> "`lev'" & quintile==5
>>>        drop work
>>> }
>>> 
>>> sort quintile yrmonth rtype
>>> 
>>> **undo the buy/sell operation
>>> *in order to do the procedure, first copy quintile #s to same <co_id> 
>>> but for 6 <yrmonth> LATER
>>> 
>>> bysort co_id period: egen tocopy2 = total(quintile / (rtype == 
>>> "buy_sell_period")) bysort co_id rtype (negperiod) : replace quintile =
> tocopy2[_n+6] if missing(quintile) & rtype == "hold_period"
>>> sort quintile yrmonth rtype
>>> 
>>> *add sums of 1st/5th quintiles for <hold_period> to variables P1/P5
>>> 
>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local 
>>> levs {
>>>        egen work=total(return) if rtype=="hold_period" & yrmonth ==
> "`lev'" & quintile==5
>>>        replace P1_sell=work if rtype=="hold_period" & yrmonth == "`lev'"
> & quintile==5
>>>        drop work
>>> }
>>> 
>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local 
>>> levs {
>>>        egen work=total(return) if rtype=="hold_period" & yrmonth ==
> "`lev'" & quintile==1
>>>        replace P5_buy=work if rtype=="hold_period" & yrmonth == "`lev'"
> & quintile==1
>>>        drop work
>>> }
>>> sort quintile yrmonth rtype
>>> 
>>> 
>>> ***------procedures for Strategy analysis **preparing time-series
>>> *P1 is the variable to use for the time-series / keep -P1_sell- 
>>> intact just for the sake of it
>>> 
>>> gen P1 = P1_sell
>>> gen copyP1=P1
>>> replace P1 = . if P1 == copyP1[_n-1]
>>> drop copyP1
>>> 
>>> *P5 is the variable to use for the time-series / keep -P5_buy- intact 
>>> just for the sake of it
>>> 
>>> gen P5 = P5_buy
>>> gen copyP5=P5
>>> replace P5 = . if P5 == copyP5[_n-1]
>>> drop copyP5
>>> 
>>> *keeping only time-series variables & unique records keep P1 P5 
>>> period
>>> 
>>> sort period P1 P5
>>> quietly by period P1 P5:  gen dup = cond(_N==1,0,_n) drop if dup>0 
>>> drop dup
>>> 
>>> sort period P1 P5
>>> gen P5copy = P5
>>> replace P5 = P5copy[_n+1] if P5 >= .
>>> replace P5 = P5copy[_n+3] if P5 >= .
>>> drop P5copy
>>> 
>>> sort period
>>> quietly by period: gen dup = cond(_N==1,0,_n) drop if dup>2 drop dup
>>> 
>>> gen temp = P1 + P5
>>> drop if temp >= .
>>> drop temp
>>> 
>>> by period: egen strategy=total(P1 + P5)
>>> 
>>> sort strategy
>>> quietly by strategy: gen dup = cond(_N==1,0,_n) drop if dup>1 drop 
>>> dup
>>> 
>>> sort period
>>> 
>>> ** changing into a time-series // not sure if it is necessary yet...
>>> tsset period
>>> mean P1 P5 strategy
>>> ******end of code
>>> 
>>> Thanks for your consideration! Any comment or suggestions will be
> appreciated.
>>> Clarice
>>> 
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index