Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: How to count occurrences of specific value

 From Michael Barker To statalist@hsphsun2.harvard.edu Subject Re: st: How to count occurrences of specific value Date Thu, 2 May 2013 11:38:16 -0400

```Yes, that is correct. Thanks for catching that mistake.

Mike

On Thu, May 2, 2013 at 11:30 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> The line
>
> replace temp = temp[`i'] + temp[`j']
>
> should presumably be
>
> replace temp = temp[`i'] + temp[`j'] in `i'
>
> Otherwise, -temp- ends up holding the last constant calculated.
> Nick
> njcoxstata@gmail.com
>
>
> On 2 May 2013 16:21, Michael Barker <mdb96statalist@gmail.com> wrote:
>> It looks like you are comparing each observation to every other
>> observation in your data set. If your data are sorted, you only have
>> to look back within each 5-year window for each id. It looks like your
>> data are sorted descending by date, so the code would look like this:
>>
>> gen temp = flag
>> local N = _N
>> forvalues i = 1(1)`N' {
>> local j=`i'+1
>> while (id[`i']==id[`j'] & (date[`i'] - date[`j'])/365.25 <= 5) {
>> replace temp = temp[`i'] + temp[`j']
>> local j = `j'+1
>> }
>>
>> If your data were sorted ascending by date, you would just iterate j
>> downwards (j = i-1, j=j-1) and start the "forvalues" loop at 2
>> (forvalues 2(1)`N' {)
>>
>> Mike
>>
>> On Wed, May 1, 2013 at 8:31 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> Your code wouldn't work as
>>>
>>> id = id[`i']
>>>
>>> should be
>>>
>>> id == id[`i']
>>>
>>> but I presume you just copied it incorrectly.
>>>
>>> I don't know a sure-fire way to speed this up. It might just be faster
>>> if you -expand-ed the data to one observation for every day. Then the
>>> code would be  simpler,  but you would end up with several millions of
>>> observations. Or you could translate the code into Mata.
>>>
>>> I don't think -egen- will help you here at all. The essence of the
>>> problem is comparing each observation with others for the same
>>> identifier, and -egen- stops at simple recipes of that kind.
>>> Nick
>>> njcoxstata@gmail.com
>>>
>>>
>>> On 1 May 2013 13:03, Jia Peng <jiapengcass@gmail.com> wrote:
>>>> Dear All,
>>>>
>>>> I have a data set with the following structure,
>>>>
>>>> id              date                    flag
>>>> 95001   14jun2000       1
>>>> 95001   12apr2000       1
>>>> 95001   16mar2000       0
>>>> 95001   16nov1999       0
>>>> 95001   10may1999       1
>>>> 95001   30mar1995       0
>>>> 95002   01nov1989       0
>>>> 95002   01mar1985       1
>>>> 95002   01jun1983       0
>>>> 95002   01may1983       1
>>>> 95002   01dec1982       0
>>>> 95002   01oct1982       0
>>>>
>>>> And now, I would like to generate a new variable, say temp, which represents
>>>> for each observation how many times flag == 1 has occurred within the same
>>>> id from five years ago to the date specified, i.e., for the first
>>>> observation, I want to count how many times flag == 1 has occurred with the
>>>> id 95001 between 14jun1995 and 14jun2000.
>>>>
>>>> I have tried to loop over every observation using the following code,
>>>>
>>>> gen temp = .
>>>> local N = _N
>>>> forvalues i = 1(1)`N' {
>>>> count if flag == 1 & id = id[`i'] & (date[`i'] - date)/365.25 <= 5 &
>>>> (date[`i'] - date)/365.25 >= 0
>>>> replace temp = r(N) in `i'
>>>> }
>>>>
>>>> However, there are half a million observations in the entire data and the
>>>> above code cost hours of time. Is there any way to solve the above problem
>>>> more efficiently?
>>>>
>>>> I have also tried to use -egen-, but all I can get is how many times flag ==
>>>> 1 has occurred with the same id. Is there any way to take into consideration
>>>> different date ranges in this context?
>>>>
>>>> Any thoughts?
>>>>
>>>>
>>>> Peng Jia
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```