Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: How to count occurrences of specific value |

Date |
Thu, 2 May 2013 16:30:40 +0100 |

The line replace temp = temp[`i'] + temp[`j'] should presumably be replace temp = temp[`i'] + temp[`j'] in `i' Otherwise, -temp- ends up holding the last constant calculated. Nick njcoxstata@gmail.com On 2 May 2013 16:21, Michael Barker <mdb96statalist@gmail.com> wrote: > It looks like you are comparing each observation to every other > observation in your data set. If your data are sorted, you only have > to look back within each 5-year window for each id. It looks like your > data are sorted descending by date, so the code would look like this: > > gen temp = flag > local N = _N > forvalues i = 1(1)`N' { > local j=`i'+1 > while (id[`i']==id[`j'] & (date[`i'] - date[`j'])/365.25 <= 5) { > replace temp = temp[`i'] + temp[`j'] > local j = `j'+1 > } > > If your data were sorted ascending by date, you would just iterate j > downwards (j = i-1, j=j-1) and start the "forvalues" loop at 2 > (forvalues 2(1)`N' {) > > Mike > > On Wed, May 1, 2013 at 8:31 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> Your code wouldn't work as >> >> id = id[`i'] >> >> should be >> >> id == id[`i'] >> >> but I presume you just copied it incorrectly. >> >> I don't know a sure-fire way to speed this up. It might just be faster >> if you -expand-ed the data to one observation for every day. Then the >> code would be simpler, but you would end up with several millions of >> observations. Or you could translate the code into Mata. >> >> I don't think -egen- will help you here at all. The essence of the >> problem is comparing each observation with others for the same >> identifier, and -egen- stops at simple recipes of that kind. >> Nick >> njcoxstata@gmail.com >> >> >> On 1 May 2013 13:03, Jia Peng <jiapengcass@gmail.com> wrote: >>> Dear All, >>> >>> I have a data set with the following structure, >>> >>> id date flag >>> 95001 14jun2000 1 >>> 95001 12apr2000 1 >>> 95001 16mar2000 0 >>> 95001 16nov1999 0 >>> 95001 10may1999 1 >>> 95001 30mar1995 0 >>> 95002 01nov1989 0 >>> 95002 01mar1985 1 >>> 95002 01jun1983 0 >>> 95002 01may1983 1 >>> 95002 01dec1982 0 >>> 95002 01oct1982 0 >>> >>> And now, I would like to generate a new variable, say temp, which represents >>> for each observation how many times flag == 1 has occurred within the same >>> id from five years ago to the date specified, i.e., for the first >>> observation, I want to count how many times flag == 1 has occurred with the >>> id 95001 between 14jun1995 and 14jun2000. >>> >>> I have tried to loop over every observation using the following code, >>> >>> gen temp = . >>> local N = _N >>> forvalues i = 1(1)`N' { >>> count if flag == 1 & id = id[`i'] & (date[`i'] - date)/365.25 <= 5 & >>> (date[`i'] - date)/365.25 >= 0 >>> replace temp = r(N) in `i' >>> } >>> >>> However, there are half a million observations in the entire data and the >>> above code cost hours of time. Is there any way to solve the above problem >>> more efficiently? >>> >>> I have also tried to use -egen-, but all I can get is how many times flag == >>> 1 has occurred with the same id. Is there any way to take into consideration >>> different date ranges in this context? >>> >>> Any thoughts? >>> >>> >>> Peng Jia >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/faqs/resources/statalist-faq/ >>> * http://www.ats.ucla.edu/stat/stata/ >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: How to count occurrences of specific value***From:*Michael Barker <mdb96statalist@gmail.com>

**References**:**st: How to count occurrences of specific value***From:*"Jia Peng" <jiapengcass@gmail.com>

**Re: st: How to count occurrences of specific value***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: How to count occurrences of specific value***From:*Michael Barker <mdb96statalist@gmail.com>

- Prev by Date:
**st: macro `if' in nonlinear least squares** - Next by Date:
**st: Time-series operators and -outreg-** - Previous by thread:
**Re: st: How to count occurrences of specific value** - Next by thread:
**Re: st: How to count occurrences of specific value** - Index(es):