Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Routine from do-file that every time it's run gives a different result


From   Clarice Martins <[email protected]>
To   [email protected]
Subject   Re: st: Routine from do-file that every time it's run gives a different result
Date   Thu, 7 Nov 2013 16:05:09 -0200

Definitely helps, Sergiy.

Of course after the checking line by line of the code, I guess my next question should be: is this the correct method? The methodology I was following just says "divide in quintiles" and I assumed (i guess wrongly) to use the "canned percentile functions".

After Nick's comment:  (Thanks, Nick also!)
>> And -xtile- is written -sortpreserve-
>> so it doesn't change the sort order of your data.


Do this means that I need to be aware of how data is sorted before using -xtile- ? 


Thanks again!!!
Clarice


On Nov 7, 2013, at 3:48 PM, Sergiy Radyakin wrote:

> Clarice,
> the following article discusses what Excel is doing to compute quartiles:
> http://stats.stackexchange.com/questions/28123/quartiles-in-excel
> 
> In general don't expect different statistical packages to break your
> observations into groups (quartiles, quintiles, deciles) identically.
> This applies not only to Excel, but also to SPSS, SAS, etc.
> http://www-01.ibm.com/support/docview.wss?uid=swg21480663
> http://www.erieri.com/blog/post/technically-speaking-does-excel-always-know-what-is-best-for-your-compensation-data
> and tons of other discussions, just check Google.
> 
> Multiple methods exist, and the defaults are not always identical
> across the packages.
> In some cases it might be better to be explicit, and sort and break
> the dataset into groups yourself, rather then rely on the canned
> percentile functions. Better read your code line by line, and check if
> it implements exactly what you want it to do.
> 
> Hope this helps, Sergiy
> 
> 
> 
> On Thu, Nov 7, 2013 at 12:03 PM, Nick Cox <[email protected]> wrote:
>> -xtile- is undoubtedly problematic -- as it reduces the information in
>> your data and isn't guaranteed to produce  equal-sized groups even
>> when the number of observations is an exact multiple of the number of
>> groups. But one of its rules is that observations with the same value
>> always go into the same group. And -xtile- is written -sortpreserve-
>> so it doesn't change the sort order of your data.
>> Nick
>> [email protected]
>> 
>> 
>> On 7 November 2013 16:47, Clarice Martins <[email protected]> wrote:
>>> Thanks to all for the valuable input...
>>> 
>>> Sarah, thanks for the practical tips on how to troubleshoot, I am definitely very new at any kind of programming and needed this kind of advice.
>>> 
>>> Nick, I agree that it is important to verify where the error is, the -stable- option might aid me, but I will definitely search forward to figure out where are my hidden assumptions.
>>> 
>>> Just another question on this issue:
>>> Another portion of the code uses -xtile- to break the portfolio of returns in quintiles. (at first, I didn't think it was important.)
>>> 
>>> But... Could there be a problem also with how -xtile- break the dataset in groups?  I mean, even when I did this manually in Excel, it was always difficult to decide how many observations stay in each quintile group. (e.g.: if the dataset has 21 observations, we will have 4 groups of 4 and one group of 5, which group takes the extra observation?)
>>> 
>>> Thanks again!!!
>>> Clarice
>>> 
>>> 
>>> On Nov 6, 2013, at 8:11 PM, Sarah Edgington wrote:
>>> 
>>>> Clarice,
>>>> Nick's right that you need to do more digging.  However, I would argue that
>>>> the solution of using the stable option to -sort- is worse than "[solving]
>>>> the problem with the price of not understanding it."  Using -sort, stable-
>>>> is actually just pretending that there is not a problem at all.  Yes, that
>>>> strategy will get you consistent results, but the chances that they'll be
>>>> the right results are pretty slim.  Being able to reproduce the wrong answer
>>>> is generally just as bad as not being able to reproduce the answer at all.
>>>> 
>>>> To be a bit more explicit, what sort order you end up with clearly matters
>>>> for your results.  You need to figure out why the variables you're sorting
>>>> on are not producing unique results and figuring out how to fix that that.
>>>> Using -sort, stable- may very well appear to fix your problem but presumably
>>>> you care whether the average of P5 is 6.154 or 3.286.  If you don't do more
>>>> investigation you'll never know which of those is the number you're really
>>>> looking for (or whether it's something else completely).
>>>> 
>>>> One thing that I find useful when troubleshooting this kind of problem is to
>>>> use -sum- after every section where I create new variables with values where
>>>> sort order matters.  Then I'll run the dofile multiple times, saving a
>>>> logfile with a different name each time.  Usually you can pretty quickly
>>>> spot where things went wrong by comparing the log files from two different
>>>> runs, as long as you put in descriptive of your created variables along the
>>>> way.
>>>> 
>>>> Another useful command when trying to identify whether you're uniquely
>>>> sorting observations is -isid-.  Any combination of variables that don't
>>>> function as a unique ID will leave you with ties on the sort, leading to the
>>>> kind of unpredictable results you see here.
>>>> 
>>>> -Sarah
>>>> 
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Nick Cox
>>>> Sent: Wednesday, November 06, 2013 1:51 PM
>>>> To: [email protected]
>>>> Subject: Re: st: Routine from do-file that every time it's run gives a
>>>> different result
>>>> 
>>>> But that solves the problem with the price of not understanding it.
>>>> Somewhere Clarice has hidden assumptions about the -sort- order being enough
>>>> to get the right order without extra information that are not correct.
>>>> Nick
>>>> [email protected]
>>>> 
>>>> 
>>>> On 6 November 2013 21:46, Sergiy Radyakin <[email protected]> wrote:
>>>>> Clarice, add the option stable to the sort commands. Without this
>>>>> option, the -sort- command will break the ties randomly. See here:
>>>>> http://www.stata.com/help.cgi?sort
>>>>> 
>>>>> Best, Sergiy
>>>>> 
>>>>> On Wed, Nov 6, 2013 at 4:30 PM, Clarice Martins
>>>>> <[email protected]> wrote:
>>>>>> Dear Statalist group,
>>>>>> 
>>>>>> I have a routine that apparently was running ok, and then I noticed that
>>>> everytime I execute the code I get different results for one of the
>>>> variables.
>>>>>> (The routine is long, so I don't know how to best provide you guys
>>>>>> with enough info.)
>>>>>> 
>>>>>> 1) I believe the problem has to do with variable -P5- since this is the
>>>> variable which average changes every time I run the code.
>>>>>> 
>>>>>> 2) Sample of the results, I am getting:  as you can see variable P1
>>>>>> is always approximately the same (it should be the same) and variable
>>>>>> Strategy is ALWAYS the same, but var -P5- changes by a lot. (I've
>>>>>> shown two outputs, but I've ran it several, several times.)
>>>>>> 
>>>>>> 
>>>>>> . esttab .
>>>>>> 
>>>>>> ----------------------------
>>>>>>                     (1)
>>>>>>                    Mean
>>>>>> ----------------------------
>>>>>> P1                  0.300***
>>>>>>                  (3.41)
>>>>>> 
>>>>>> P5                  6.154
>>>>>>                  (1.53)
>>>>>> 
>>>>>> strategy            7.190
>>>>>>                  (1.78)
>>>>>> ----------------------------
>>>>>> N                     150
>>>>>> ----------------------------
>>>>>> 
>>>>>> 
>>>>>> ----------------------------
>>>>>>                     (1)
>>>>>>                    Mean
>>>>>> ----------------------------
>>>>>> P1                  0.223*
>>>>>>                  (2.24)
>>>>>> 
>>>>>> P5                  3.286
>>>>>>                  (1.15)
>>>>>> 
>>>>>> strategy            7.190
>>>>>>                  (1.78)
>>>>>> ----------------------------
>>>>>> N                     150
>>>>>> ----------------------------
>>>>>> 
>>>>>> 3) Piece of the code that deals with creating and changing variable
>>>>>> P5: (my apologies if this is confusing or too long)
>>>>>> 
>>>>>> ***create variable P1/P5 and sum all 1st/5th quintiles per <yrmonth>
>>>>>> gen P1_sell = .
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>>       egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==1
>>>>>>       replace P1_sell=work if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==1
>>>>>>       drop work
>>>>>> }
>>>>>> 
>>>>>> gen P5_buy = .
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>>       egen work=total(return) if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==5
>>>>>>       replace P5_buy=work if rtype=="buy_sell_period" & yrmonth ==
>>>> "`lev'" & quintile==5
>>>>>>       drop work
>>>>>> }
>>>>>> 
>>>>>> sort quintile yrmonth rtype
>>>>>> 
>>>>>> **undo the buy/sell operation
>>>>>> *in order to do the procedure, first copy quintile #s to same <co_id>
>>>>>> but for 6 <yrmonth> LATER
>>>>>> 
>>>>>> bysort co_id period: egen tocopy2 = total(quintile / (rtype ==
>>>>>> "buy_sell_period")) bysort co_id rtype (negperiod) : replace quintile =
>>>> tocopy2[_n+6] if missing(quintile) & rtype == "hold_period"
>>>>>> sort quintile yrmonth rtype
>>>>>> 
>>>>>> *add sums of 1st/5th quintiles for <hold_period> to variables P1/P5
>>>>>> 
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>>       egen work=total(return) if rtype=="hold_period" & yrmonth ==
>>>> "`lev'" & quintile==5
>>>>>>       replace P1_sell=work if rtype=="hold_period" & yrmonth == "`lev'"
>>>> & quintile==5
>>>>>>       drop work
>>>>>> }
>>>>>> 
>>>>>> quietly levelsof yrmonth, local(levs) quietly foreach lev of local
>>>>>> levs {
>>>>>>       egen work=total(return) if rtype=="hold_period" & yrmonth ==
>>>> "`lev'" & quintile==1
>>>>>>       replace P5_buy=work if rtype=="hold_period" & yrmonth == "`lev'"
>>>> & quintile==1
>>>>>>       drop work
>>>>>> }
>>>>>> sort quintile yrmonth rtype
>>>>>> 
>>>>>> 
>>>>>> ***------procedures for Strategy analysis **preparing time-series
>>>>>> *P1 is the variable to use for the time-series / keep -P1_sell-
>>>>>> intact just for the sake of it
>>>>>> 
>>>>>> gen P1 = P1_sell
>>>>>> gen copyP1=P1
>>>>>> replace P1 = . if P1 == copyP1[_n-1]
>>>>>> drop copyP1
>>>>>> 
>>>>>> *P5 is the variable to use for the time-series / keep -P5_buy- intact
>>>>>> just for the sake of it
>>>>>> 
>>>>>> gen P5 = P5_buy
>>>>>> gen copyP5=P5
>>>>>> replace P5 = . if P5 == copyP5[_n-1]
>>>>>> drop copyP5
>>>>>> 
>>>>>> *keeping only time-series variables & unique records keep P1 P5
>>>>>> period
>>>>>> 
>>>>>> sort period P1 P5
>>>>>> quietly by period P1 P5:  gen dup = cond(_N==1,0,_n) drop if dup>0
>>>>>> drop dup
>>>>>> 
>>>>>> sort period P1 P5
>>>>>> gen P5copy = P5
>>>>>> replace P5 = P5copy[_n+1] if P5 >= .
>>>>>> replace P5 = P5copy[_n+3] if P5 >= .
>>>>>> drop P5copy
>>>>>> 
>>>>>> sort period
>>>>>> quietly by period: gen dup = cond(_N==1,0,_n) drop if dup>2 drop dup
>>>>>> 
>>>>>> gen temp = P1 + P5
>>>>>> drop if temp >= .
>>>>>> drop temp
>>>>>> 
>>>>>> by period: egen strategy=total(P1 + P5)
>>>>>> 
>>>>>> sort strategy
>>>>>> quietly by strategy: gen dup = cond(_N==1,0,_n) drop if dup>1 drop
>>>>>> dup
>>>>>> 
>>>>>> sort period
>>>>>> 
>>>>>> ** changing into a time-series // not sure if it is necessary yet...
>>>>>> tsset period
>>>>>> mean P1 P5 strategy
>>>>>> ******end of code
>>>>>> 
>>>>>> Thanks for your consideration! Any comment or suggestions will be
>>>> appreciated.
>>>>>> Clarice
>>>>>> 
>>>>>> 
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>> 
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> 
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>> 
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> 
>>> 
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index