Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: Unbalanced panel, count number of incidents


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: RE: st: Unbalanced panel, count number of incidents
Date   Tue, 24 Jul 2007 17:30:48 +0100

1. The -lag()- option works with whatever you have
declared to be the time variable using -tsset-.

2. The program -count_recent- as first posted ignores observations
that don't satisfy the -if- condition when doing a -generate-. That is
standard and was deliberate. However, I can see from your question
that there is a very good case here for a non-standard -if-. So, the version below strips out the condition

if `touse'[`i'] {


}

and -count-s if the condition was satisfied but puts a value in
each observation regardless of that condition. The shorter program is more general, as is quite often true.

3. Also, the _current_ positive response can't be _previous_.
Your original post referred to (e.g.) "the total number of contacts in the last 30 or 60 days". In my reply, I did say, quite explicitly, that my program counts observations "within 1 to -lag- (compulsory option) time units previous" and this is what is done by the code

inrange(`t'[`i'] - `t', 1, `lag')

If you want to _include_ the current event, you have at least two
options:

(a) modify the code to

inrange(`t'[`i'] - `t', 0, `lag')

Now, as you've touched the program, it's yours and you're responsible.

(b)just follow by

replace <whatever> = <whatever> + PosResp

which will bump up the count if and only if the current observation is
a positive response. Although you shouldn't trust names, my name
-count_recent- showed an intent to count past events, not current ones.

*! 1.1.0 NJC 24 July 2007
*! 1.0.0 NJC 22 July 2007
program count_recent
version 8
syntax [if] [in], Lag(numlist int max=1 >0) Generate(str)

quietly {
confirm new var `generate'

marksample touse
count if `touse'
if r(N) == 0 error 2000

tsset
local p "`r(panelvar)'"
local t "`r(timevar)'"
if "`p'" == "" {
tempvar p
gen byte `p' = 1
}

gen `generate' = .

forval i = 1/`=_N' {
count if `touse' ///
& inrange(`t'[`i'] - `t', 1, `lag') ///
& `p' == `p'[`i']
replace `generate' = r(N) in `i'
}
}
end

Nick
n.j.cox@durham.ac.uk

Andrew Stocking

Thank you Nick. Your -ntimeofday- function worked perfectly. In addition
the -count_recent- function worked beautifully for counting up the number of
contacts over the last 43200 minutes (I was able to get everything into
minutes). So thank you for both.

The one thing that doesn't seem to be working quite right is the
-count_recent- with the -if()- specification. I type:
- count_recent if Response == 1, lag(43200) generate(posResp)-

And I get the default -.- for everytime the is Response==0 and a 0 every
time Response==1 (this is because often the only response in the last 43200
minutes (or 30 days) is the one where Response==1. In table form, what I
should see (as if lag=5 instead of lag=43200):
ID Date Response posResp
1 1 0 0
1 5 1 1
1 6 0 1
1 9 1 2
1 11 0 1
1 16 0 0
1 20 1 1

In table form, this is what I do see (as if lag=5 instead of lag=43200):
ID Date Response posResp
1 1 0 .
1 5 1 0
1 6 0 .
1 9 1 1
1 11 0 .
1 16 0 .
1 20 1 0

So, -if()- part of the program seems to be 1) ignoring the current positive
response, counting everything historically over the lag and 2) doing no
calculations for any rows where the Response was 0. I can easily add 1 to
everything and then replace the missing values with 0 to account for the
first issue. But I'm not sure how to modify the program to account for the
second. Maybe this is how it was intended to run. My stata programming
ability ranges somewhere between weak and non-existent or I'd try to modify
the program myself.

>
> *! 1.0.0 NJC 22 July 2007
> program count_recent
> version 8
> syntax [if] [in], Lag(numlist int max=1 >0) Generate(str)
>
> quietly {
> confirm new var `generate'
>
> marksample touse
> count if `touse'
> if r(N) == 0 error 2000
>
> tsset
> local p "`r(panelvar)'"
> local t "`r(timevar)'"
> if "`p'" == "" {
> tempvar p
> gen byte `p' = 1
> }
>
> gen `generate' = .
>
> forval i = 1/`=_N' {
> if `touse'[`i'] {
> count if `touse' ///
> & inrange(`t'[`i'] - `t', 1, `lag') ///
> & `p' == `p'[`i']
> replace `generate' = r(N) in `i'
> }
> }
> }
> end
>
> What we are counting, for each observation, are
> how many observations are
>
> (c) in the same panel (whenever there is panel structure)
> -- you don't quite say this is what you want, but I guess
> it's true.
>
> (b) within 1 to -lag- (compulsory option) time units previous
>
> (a) relevant (by default all observations). This is determined
> by any -if- or -in- conditions.
>
> I assume a prior -tsset-.
>
> So, examples could be
>
> tsset ID Date
> count_recent , lag(30) generate(prev30)
> count_recent if Response == 1, lag(60) generate(pos_prev60)

Andrew Stocking

> > I have an unbalanced panel of subjects who have been
> > contacted very
> > irregularly over the past 5 years. Total contacts range from
> > 40-250 during
> > the 5 year period depending on the person. I'd like to create two
> > variables: one that counts the total number of contacts in
> > the last 30 or 60
> > days and a second that sums the number of positive responses
> > over the same
> > 30 or 60 days. For each contact there could be anywhere from
> > 0-15 contacts
> > in the last 30 days.
> >
> > My data looks like:
> > ID Date Response Var1 Var2
> > 1 37200 1 1 1
> > 1 37210 0 2 1
> > 1 37215 1 3 2
> > 1 37229 1 4 3
> > 1 37231 0 4 2
> > 2 37201 0 1 0
> > .....
> >
> > I can't make egen count() work for me (or really anything else).


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index