Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Michael Barker <mdb96statalist@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: How to count occurrences of specific value |
Date | Thu, 2 May 2013 11:38:16 -0400 |
Yes, that is correct. Thanks for catching that mistake. Mike On Thu, May 2, 2013 at 11:30 AM, Nick Cox <njcoxstata@gmail.com> wrote: > The line > > replace temp = temp[`i'] + temp[`j'] > > should presumably be > > replace temp = temp[`i'] + temp[`j'] in `i' > > Otherwise, -temp- ends up holding the last constant calculated. > Nick > njcoxstata@gmail.com > > > On 2 May 2013 16:21, Michael Barker <mdb96statalist@gmail.com> wrote: >> It looks like you are comparing each observation to every other >> observation in your data set. If your data are sorted, you only have >> to look back within each 5-year window for each id. It looks like your >> data are sorted descending by date, so the code would look like this: >> >> gen temp = flag >> local N = _N >> forvalues i = 1(1)`N' { >> local j=`i'+1 >> while (id[`i']==id[`j'] & (date[`i'] - date[`j'])/365.25 <= 5) { >> replace temp = temp[`i'] + temp[`j'] >> local j = `j'+1 >> } >> >> If your data were sorted ascending by date, you would just iterate j >> downwards (j = i-1, j=j-1) and start the "forvalues" loop at 2 >> (forvalues 2(1)`N' {) >> >> Mike >> >> On Wed, May 1, 2013 at 8:31 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>> Your code wouldn't work as >>> >>> id = id[`i'] >>> >>> should be >>> >>> id == id[`i'] >>> >>> but I presume you just copied it incorrectly. >>> >>> I don't know a sure-fire way to speed this up. It might just be faster >>> if you -expand-ed the data to one observation for every day. Then the >>> code would be simpler, but you would end up with several millions of >>> observations. Or you could translate the code into Mata. >>> >>> I don't think -egen- will help you here at all. The essence of the >>> problem is comparing each observation with others for the same >>> identifier, and -egen- stops at simple recipes of that kind. >>> Nick >>> njcoxstata@gmail.com >>> >>> >>> On 1 May 2013 13:03, Jia Peng <jiapengcass@gmail.com> wrote: >>>> Dear All, >>>> >>>> I have a data set with the following structure, >>>> >>>> id date flag >>>> 95001 14jun2000 1 >>>> 95001 12apr2000 1 >>>> 95001 16mar2000 0 >>>> 95001 16nov1999 0 >>>> 95001 10may1999 1 >>>> 95001 30mar1995 0 >>>> 95002 01nov1989 0 >>>> 95002 01mar1985 1 >>>> 95002 01jun1983 0 >>>> 95002 01may1983 1 >>>> 95002 01dec1982 0 >>>> 95002 01oct1982 0 >>>> >>>> And now, I would like to generate a new variable, say temp, which represents >>>> for each observation how many times flag == 1 has occurred within the same >>>> id from five years ago to the date specified, i.e., for the first >>>> observation, I want to count how many times flag == 1 has occurred with the >>>> id 95001 between 14jun1995 and 14jun2000. >>>> >>>> I have tried to loop over every observation using the following code, >>>> >>>> gen temp = . >>>> local N = _N >>>> forvalues i = 1(1)`N' { >>>> count if flag == 1 & id = id[`i'] & (date[`i'] - date)/365.25 <= 5 & >>>> (date[`i'] - date)/365.25 >= 0 >>>> replace temp = r(N) in `i' >>>> } >>>> >>>> However, there are half a million observations in the entire data and the >>>> above code cost hours of time. Is there any way to solve the above problem >>>> more efficiently? >>>> >>>> I have also tried to use -egen-, but all I can get is how many times flag == >>>> 1 has occurred with the same id. Is there any way to take into consideration >>>> different date ranges in this context? >>>> >>>> Any thoughts? >>>> >>>> >>>> Peng Jia >>>> >>>> * >>>> * For searches and help try: >>>> * http://www.stata.com/help.cgi?search >>>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>>> * http://www.ats.ucla.edu/stat/stata/ >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/