Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Dividing data into quintiles


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: RE: Dividing data into quintiles
Date   Tue, 29 Oct 2013 00:48:48 +0000

You should look at the code more carefully to understand these comments.

1. In David's original code, a new temporary variable was created to
receive results from -xtile- each time round the loop. Once the loop
is left these temporary variables were all still in memory.

In my code, I create a new variable each time around the loop but
-drop- it as soon as the useful results it contains are transferred to
the variable -quintile-. Once the loop is finished, there are no
unnecessary variables still in memory.

Your suggestion "wouldn't it be good to have this temporary variable
with signals for me each quintile?" misses this point and asks
something quite different. The point is that the non-missing content
of the variables created during the loop is copied across to the
variable -quintile- in a correct version of the code. So, all the
information has been preserved and nothing is thrown away that is not
useless.

You only want the one variable, but we build up its contents in a loop.

2. David suggested (the equivalent of)

   replace quintile = work

but this is buggy and should be

   replace quintile = work if rtype=="formation" & yrmonth == "`lev'"

as otherwise the missing values present for observations in which it
is _not_ true that

rtype=="formation" & yrmonth == "`lev'"

will be transferred to -quintile- each time around the loop. The
result will be that only the non-missing results from the last time
around the loop would be saved (not quite a variable with entirely
missing values, as you appeared to report earlier).

Nick
njcoxstata@gmail.com


On 28 October 2013 22:09, Clarice Martins <martins.clarice@gmail.com> wrote:
> Dear Nick,
>
> I do not see the problem with:
>
>> I would not create a new temporary variable each
>> time round the loop.
>
> if i do need to know these quintiles for further calculation, wouldn't it be good to have this temporary variable with signals for me each quintile? What am I missing?
>
>> The -replace- should be restricted to the
>> observations it was calculated for;
>
> this one honestly, i didn't understand what it does...

On Oct 28, 2013, at 7:49 PM, Nick Cox wrote:

>> Here is a tweak of David's code. I have not tested this, but I can see
>> two problems with his code. The -replace- should be restricted to the
>> observations it was calculated for; otherwise you overwrite good
>> values with missings. I would not create a new temporary variable each
>> time round the loop.
>>
>> gen quintile = .
>> quietly levelsof yrmonth, local(levs)
>>
>> quietly foreach lev of local levs {
>>         xtile work = return if rtype=="formation" & yrmonth == "`lev'", n(5)
>>         replace quintile = work if rtype=="formation" & yrmonth == "`lev'"
>>         drop work
>> }

On 28 October 2013 21:33, Clarice Martins <martins.clarice@gmail.com> wrote:

>>> I believe you understood correctly!  Thank you for your suggestion!
>>>
>>> But forgive me, if I am totally off, I am very new with Stata: I think I understood your code, but after creating the variable quintile, this one was kept empty. Should I see values recorded on that? Or the results of the code are stored virtually?   (sorry! very basic question!)
>>>
>>> In my research, the next step will use returns for the companies in the 1st and 5th quintile of each  -rtype=="formation" & yrmonth == "`lev'"-  for calculations, so I do need to have available these quintiles (specifically 1st and 5th) for next step.

On Oct 28, 2013, at 4:58 PM, Radwin, David wrote:

>>>> If I understand correctly, you merely need to loop through each value of yrmonth and calculate quintiles for observations with that value for yrmonth. But you can't generate a new variable 151 times.
>>>>
>>>> So you could do something this that creates quintiles for a temporary variable:
>>>>
>>>> gen quintile = .
>>>> quietly levelsof yrmonth, local(levs)
>>>> foreach lev of local levs {
>>>>      tempvar quint
>>>>      xtile `quint' = return if rtype=="formation" & yrmonth == "`lev'", n(5)
>>>>      replace quintile = `quint'
>>>>      }

Clarice Martins

>>>>> I need to select a sub-group of my dataset and cut it into quintiles, in
>>>>> order to proceed with calculations with top and bottom quintile.
>>>>>
>>>>> - I use Stata 12.1 (for Mac)
>>>>>
>>>>> - my data looks like this (now, after considerations from the group, thank
>>>>> you!)
>>>>>
>>>>> co_id ticker rtype   yrmonth return
>>>>> 1 ABCB formation jun2000 0.01
>>>>> 1 ABCB buysell jul2000 0.01
>>>>> 1 ABCB holding ago2000 0.01
>>>>> 2 AEDU formation jun2000 0.01
>>>>> 2 AEDU buysell jul2000 0.01
>>>>> 2 AEDU holding ago2000 0.01
>>>>> 3 AMBV formation jun2000 0.01
>>>>> 3 AMBV buysell jul2000 0.01
>>>>> 3 AMBV holding ago2000 0.01
>>>>>
>>>>> - i tryed both -egen cut- and -xtile-
>>>>>
>>>>> egen quintile = cut(return) if rtype=="formation", group(5)
>>>>>
>>>>> xtile quintile=return if rtype=="formation", n(5)
>>>>>
>>>>> - when I use -if rtype=="formation"- both commands work fine!
>>>>>
>>>>> - But, I need to select all observations rtype=="formation"  AND
>>>>> yrmonth=="jun2000", then cut this subgroup into quintiles
>>>>>
>>>>> - Then, I need to proceed with this filter/select and cut, for every
>>>>> yrmonth (151 periods) on my data set
>>>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index