Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How to count occurrences of specific value


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: How to count occurrences of specific value
Date   Wed, 1 May 2013 13:31:24 +0100

Your code wouldn't work as

id = id[`i']

should be

id == id[`i']

but I presume you just copied it incorrectly.

I don't know a sure-fire way to speed this up. It might just be faster
if you -expand-ed the data to one observation for every day. Then the
code would be  simpler,  but you would end up with several millions of
observations. Or you could translate the code into Mata.

I don't think -egen- will help you here at all. The essence of the
problem is comparing each observation with others for the same
identifier, and -egen- stops at simple recipes of that kind.
Nick
njcoxstata@gmail.com


On 1 May 2013 13:03, Jia Peng <jiapengcass@gmail.com> wrote:
> Dear All,
>
> I have a data set with the following structure,
>
> id              date                    flag
> 95001   14jun2000       1
> 95001   12apr2000       1
> 95001   16mar2000       0
> 95001   16nov1999       0
> 95001   10may1999       1
> 95001   30mar1995       0
> 95002   01nov1989       0
> 95002   01mar1985       1
> 95002   01jun1983       0
> 95002   01may1983       1
> 95002   01dec1982       0
> 95002   01oct1982       0
>
> And now, I would like to generate a new variable, say temp, which represents
> for each observation how many times flag == 1 has occurred within the same
> id from five years ago to the date specified, i.e., for the first
> observation, I want to count how many times flag == 1 has occurred with the
> id 95001 between 14jun1995 and 14jun2000.
>
> I have tried to loop over every observation using the following code,
>
> gen temp = .
> local N = _N
> forvalues i = 1(1)`N' {
> count if flag == 1 & id = id[`i'] & (date[`i'] - date)/365.25 <= 5 &
> (date[`i'] - date)/365.25 >= 0
> replace temp = r(N) in `i'
> }
>
> However, there are half a million observations in the entire data and the
> above code cost hours of time. Is there any way to solve the above problem
> more efficiently?
>
> I have also tried to use -egen-, but all I can get is how many times flag ==
> 1 has occurred with the same id. Is there any way to take into consideration
> different date ranges in this context?
>
> Any thoughts?
>
>
> Peng Jia
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index