Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Routine from do-file that every time it's run gives a different result
From
Sergiy Radyakin <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: Routine from do-file that every time it's run gives a different result
Date
Wed, 6 Nov 2013 17:10:28 -0500
On Wed, Nov 6, 2013 at 4:50 PM, Nick Cox <[email protected]> wrote:
> But that solves the problem with the price of not understanding it.
Nick, I agree, but Clarice was interested in why the results change at
random. The -sort- command has a consequence of element of randomness,
which literally bites everyone at least once (e.g. it bites myself at
a rate about once per year:). In fact I would call it -randomsort- or
-sort, random- to make it explicit, and -sort- would be the name for
the current -sort, stable-. Just like summarize by default calculates
sd and produces some output, and -summarize, meanonly- is a faster
restricted version for programmers. I didn't even read the whole code
and stopped once I saw -sort- without stable option.
> Somewhere Clarice has hidden assumptions about the -sort- order being
> enough to get the right order without extra information that are not
> correct.
I believe Clarice would immediately validate the results, and
recognize the errors of this kind as soon as the results stabilize. So
far there is no point in running validations since the results are
going to change all the time.
Clarice may also be interested in reading about the -duplicates-
command. As it seems she is simulating it's behavior.
The rest of the code can be optimized for readability. There are a lot
of "magic" transformations of Pi's above, which are incomprehensible
to me now, and (probably) to Clarice in a couple of months after the
moment of writing. But some obvious optimizations I advise to take are
of kind:
gen temp = P1 + P5
drop if temp >= .
drop temp
is simply:
drop if missing(P1+P5)
Best, Sergiy
> Nick
> [email protected]
>
>
> On 6 November 2013 21:46, Sergiy Radyakin <[email protected]> wrote:
>> Clarice, add the option stable to the sort commands. Without this
>> option, the -sort- command will break the ties randomly. See here:
>> http://www.stata.com/help.cgi?sort
>>
>> Best, Sergiy
>>
>> On Wed, Nov 6, 2013 at 4:30 PM, Clarice Martins
>> <[email protected]> wrote:
>>> Dear Statalist group,
>>>
>>> I have a routine that apparently was running ok, and then I noticed that everytime I execute the code I get different results for one of the variables.
>>> (The routine is long, so I don't know how to best provide you guys with enough info.)
>>>
>>> 1) I believe the problem has to do with variable -P5- since this is the variable which average changes every time I run the code.
>>>
>>> 2) Sample of the results, I am getting: as you can see variable P1 is always approximately the same (it should be the same) and variable Strategy is ALWAYS the same, but var -P5- changes by a lot. (I've shown two outputs, but I've ran it several, several times.)
>>>
>>>
>>> . esttab .
>>>
>>> ----------------------------
>>> (1)
>>> Mean
>>> ----------------------------
>>> P1 0.300***
>>> (3.41)
>>>
>>> P5 6.154
>>> (1.53)
>>>
>>> strategy 7.190
>>> (1.78)
>>> ----------------------------
>>> N 150
>>> ----------------------------
>>>
>>>
>>> ----------------------------
>>> (1)
>>> Mean
>>> ----------------------------
>>> P1 0.223*
>>> (2.24)
>>>
>>> P5 3.286
>>> (1.15)
>>>
>>> strategy 7.190
>>> (1.78)
>>> ----------------------------
>>> N 150
>>> ----------------------------
>>>
>>> 3) Piece of the code that deals with creating and changing variable P5: (my apologies if this is confusing or too long)
>>>
>>> ***create variable P1/P5 and sum all 1st/5th quintiles per <yrmonth>
>>> gen P1_sell = .
>>> quietly levelsof yrmonth, local(levs)
>>> quietly foreach lev of local levs {
>>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==1
>>> replace P1_sell=work if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==1
>>> drop work
>>> }
>>>
>>> gen P5_buy = .
>>> quietly levelsof yrmonth, local(levs)
>>> quietly foreach lev of local levs {
>>> egen work=total(return) if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==5
>>> replace P5_buy=work if rtype=="buy_sell_period" & yrmonth == "`lev'" & quintile==5
>>> drop work
>>> }
>>>
>>> sort quintile yrmonth rtype
>>>
>>> **undo the buy/sell operation
>>> *in order to do the procedure, first copy quintile #s to same <co_id> but for 6 <yrmonth> LATER
>>>
>>> bysort co_id period: egen tocopy2 = total(quintile / (rtype == "buy_sell_period"))
>>> bysort co_id rtype (negperiod) : replace quintile = tocopy2[_n+6] if missing(quintile) & rtype == "hold_period"
>>> sort quintile yrmonth rtype
>>>
>>> *add sums of 1st/5th quintiles for <hold_period> to variables P1/P5
>>>
>>> quietly levelsof yrmonth, local(levs)
>>> quietly foreach lev of local levs {
>>> egen work=total(return) if rtype=="hold_period" & yrmonth == "`lev'" & quintile==5
>>> replace P1_sell=work if rtype=="hold_period" & yrmonth == "`lev'" & quintile==5
>>> drop work
>>> }
>>>
>>> quietly levelsof yrmonth, local(levs)
>>> quietly foreach lev of local levs {
>>> egen work=total(return) if rtype=="hold_period" & yrmonth == "`lev'" & quintile==1
>>> replace P5_buy=work if rtype=="hold_period" & yrmonth == "`lev'" & quintile==1
>>> drop work
>>> }
>>> sort quintile yrmonth rtype
>>>
>>>
>>> ***------procedures for Strategy analysis
>>> **preparing time-series
>>> *P1 is the variable to use for the time-series / keep -P1_sell- intact just for the sake of it
>>>
>>> gen P1 = P1_sell
>>> gen copyP1=P1
>>> replace P1 = . if P1 == copyP1[_n-1]
>>> drop copyP1
>>>
>>> *P5 is the variable to use for the time-series / keep -P5_buy- intact just for the sake of it
>>>
>>> gen P5 = P5_buy
>>> gen copyP5=P5
>>> replace P5 = . if P5 == copyP5[_n-1]
>>> drop copyP5
>>>
>>> *keeping only time-series variables & unique records
>>> keep P1 P5 period
>>>
>>> sort period P1 P5
>>> quietly by period P1 P5: gen dup = cond(_N==1,0,_n)
>>> drop if dup>0
>>> drop dup
>>>
>>> sort period P1 P5
>>> gen P5copy = P5
>>> replace P5 = P5copy[_n+1] if P5 >= .
>>> replace P5 = P5copy[_n+3] if P5 >= .
>>> drop P5copy
>>>
>>> sort period
>>> quietly by period: gen dup = cond(_N==1,0,_n)
>>> drop if dup>2
>>> drop dup
>>>
>>> gen temp = P1 + P5
>>> drop if temp >= .
>>> drop temp
>>>
>>> by period: egen strategy=total(P1 + P5)
>>>
>>> sort strategy
>>> quietly by strategy: gen dup = cond(_N==1,0,_n)
>>> drop if dup>1
>>> drop dup
>>>
>>> sort period
>>>
>>> ** changing into a time-series // not sure if it is necessary yet...
>>> tsset period
>>> mean P1 P5 strategy
>>> ******end of code
>>>
>>> Thanks for your consideration! Any comment or suggestions will be appreciated.
>>> Clarice
>>>
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/