Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Clarice Martins <martins.clarice@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: RE: Dividing data into quintiles |
Date | Tue, 29 Oct 2013 09:14:25 -0200 |
Dear Nick, You're correct. Forgive me. At closer examination (and you guidance), I do understand what you mean. THanks...!!! Clarice On Oct 28, 2013, at 10:48 PM, Nick Cox wrote: > You should look at the code more carefully to understand these comments. > > 1. In David's original code, a new temporary variable was created to > receive results from -xtile- each time round the loop. Once the loop > is left these temporary variables were all still in memory. > > In my code, I create a new variable each time around the loop but > -drop- it as soon as the useful results it contains are transferred to > the variable -quintile-. Once the loop is finished, there are no > unnecessary variables still in memory. > > Your suggestion "wouldn't it be good to have this temporary variable > with signals for me each quintile?" misses this point and asks > something quite different. The point is that the non-missing content > of the variables created during the loop is copied across to the > variable -quintile- in a correct version of the code. So, all the > information has been preserved and nothing is thrown away that is not > useless. > > You only want the one variable, but we build up its contents in a loop. > > 2. David suggested (the equivalent of) > > replace quintile = work > > but this is buggy and should be > > replace quintile = work if rtype=="formation" & yrmonth == "`lev'" > > as otherwise the missing values present for observations in which it > is _not_ true that > > rtype=="formation" & yrmonth == "`lev'" > > will be transferred to -quintile- each time around the loop. The > result will be that only the non-missing results from the last time > around the loop would be saved (not quite a variable with entirely > missing values, as you appeared to report earlier). > > Nick > njcoxstata@gmail.com > > > On 28 October 2013 22:09, Clarice Martins <martins.clarice@gmail.com> wrote: >> Dear Nick, >> >> I do not see the problem with: >> >>> I would not create a new temporary variable each >>> time round the loop. >> >> if i do need to know these quintiles for further calculation, wouldn't it be good to have this temporary variable with signals for me each quintile? What am I missing? >> >>> The -replace- should be restricted to the >>> observations it was calculated for; >> >> this one honestly, i didn't understand what it does... > > On Oct 28, 2013, at 7:49 PM, Nick Cox wrote: > >>> Here is a tweak of David's code. I have not tested this, but I can see >>> two problems with his code. The -replace- should be restricted to the >>> observations it was calculated for; otherwise you overwrite good >>> values with missings. I would not create a new temporary variable each >>> time round the loop. >>> >>> gen quintile = . >>> quietly levelsof yrmonth, local(levs) >>> >>> quietly foreach lev of local levs { >>> xtile work = return if rtype=="formation" & yrmonth == "`lev'", n(5) >>> replace quintile = work if rtype=="formation" & yrmonth == "`lev'" >>> drop work >>> } > > On 28 October 2013 21:33, Clarice Martins <martins.clarice@gmail.com> wrote: > >>>> I believe you understood correctly! Thank you for your suggestion! >>>> >>>> But forgive me, if I am totally off, I am very new with Stata: I think I understood your code, but after creating the variable quintile, this one was kept empty. Should I see values recorded on that? Or the results of the code are stored virtually? (sorry! very basic question!) >>>> >>>> In my research, the next step will use returns for the companies in the 1st and 5th quintile of each -rtype=="formation" & yrmonth == "`lev'"- for calculations, so I do need to have available these quintiles (specifically 1st and 5th) for next step. > > On Oct 28, 2013, at 4:58 PM, Radwin, David wrote: > >>>>> If I understand correctly, you merely need to loop through each value of yrmonth and calculate quintiles for observations with that value for yrmonth. But you can't generate a new variable 151 times. >>>>> >>>>> So you could do something this that creates quintiles for a temporary variable: >>>>> >>>>> gen quintile = . >>>>> quietly levelsof yrmonth, local(levs) >>>>> foreach lev of local levs { >>>>> tempvar quint >>>>> xtile `quint' = return if rtype=="formation" & yrmonth == "`lev'", n(5) >>>>> replace quintile = `quint' >>>>> } > > Clarice Martins > >>>>>> I need to select a sub-group of my dataset and cut it into quintiles, in >>>>>> order to proceed with calculations with top and bottom quintile. >>>>>> >>>>>> - I use Stata 12.1 (for Mac) >>>>>> >>>>>> - my data looks like this (now, after considerations from the group, thank >>>>>> you!) >>>>>> >>>>>> co_id ticker rtype yrmonth return >>>>>> 1 ABCB formation jun2000 0.01 >>>>>> 1 ABCB buysell jul2000 0.01 >>>>>> 1 ABCB holding ago2000 0.01 >>>>>> 2 AEDU formation jun2000 0.01 >>>>>> 2 AEDU buysell jul2000 0.01 >>>>>> 2 AEDU holding ago2000 0.01 >>>>>> 3 AMBV formation jun2000 0.01 >>>>>> 3 AMBV buysell jul2000 0.01 >>>>>> 3 AMBV holding ago2000 0.01 >>>>>> >>>>>> - i tryed both -egen cut- and -xtile- >>>>>> >>>>>> egen quintile = cut(return) if rtype=="formation", group(5) >>>>>> >>>>>> xtile quintile=return if rtype=="formation", n(5) >>>>>> >>>>>> - when I use -if rtype=="formation"- both commands work fine! >>>>>> >>>>>> - But, I need to select all observations rtype=="formation" AND >>>>>> yrmonth=="jun2000", then cut this subgroup into quintiles >>>>>> >>>>>> - Then, I need to proceed with this filter/select and cut, for every >>>>>> yrmonth (151 periods) on my data set >>>>>> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/