Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Unbalanced panel, count number of incidents


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Unbalanced panel, count number of incidents
Date   Sun, 22 Jul 2007 18:59:54 +0100

I don't see how Naomi's solution gets at the most 
difficult part of this problem, which is to take account
of the irregularity of times observed in counting over
the last # days. 

-egen, count()-, or even -egen, total()- which is a better
bet for similar problems, is more or less useless for this
kind of problem as the relevant time interval varies. 

I too am probably missing something simple. However, 
when all inspiration fails, try brute force. 
The brute force approach is quite easy to program
and at worst requires a single loop over the observations. 

An approach discussed at length in 

Cox, N.J. 2007. Making it count. Stata Journal 7(1) 

is to set up a variable, loop over possibilities using
-count- for each observation, and replace each value of that
variable by the result. 

This is built into the following program: 

*! 1.0.0 NJC 22 July 2007
program count_recent
	version 8 
	syntax [if] [in], Lag(numlist int max=1 >0) Generate(str) 

	quietly { 
		confirm new var `generate' 

		marksample touse 
		count if `touse' 
		if r(N) == 0 error 2000

		tsset 
		local p "`r(panelvar)'"
		local t "`r(timevar)'" 
		if "`p'" == "" { 
			tempvar p 
			gen byte `p' = 1 
		}	

		gen `generate' = . 

		forval i = 1/`=_N' { 
			if `touse'[`i'] {
				count if `touse' ///
				& inrange(`t'[`i'] - `t', 1, `lag') ///
				& `p' == `p'[`i'] 
				replace `generate' = r(N) in `i'
			}
		} 
	} 
end

What we are counting, for each observation, are 
how many observations are 

(c) in the same panel (whenever there is panel structure) 
-- you don't quite say this is what you want, but I guess
it's true. 

(b) within 1 to -lag- (compulsory option) time units previous 

(a) relevant (by default all observations). This is determined 
by any -if- or -in- conditions. 

I assume a prior -tsset-. 

So, examples could be 

tsset ID Date 
count_recent , lag(30) generate(prev30)
count_recent if Response == 1, lag(60) generate(pos_prev60) 

Nick 
n.j.cox@durham.ac.uk 

Naomi Levy
 
> I am no expert here, and there is likely to be a much
> easier way to do this than what I am suggesting, but this is what I
> would do:
> 
> I would -reshape- your data from long form to wide
> form so that each row is an ID and the responses on each day 
> of contact
> become separate variables.
> 
> The new form would look like this:
> 
> ID    Response37200    Var137200    Var237200    
> Response37210    Var137210      Var237210
> 1                1                      1                 1   
>                0                           2                    1
> 
> Before
> you do this I suggest dropping any variables you don't need for this
> analysis and renaming variables so their names are shorter (e.g.
> response to r).  Also, if all you are interested in for the 
> analysis are more recent
> dates of contact, you can drop all the data for prior dates 
> of contact.
> 
> the syntax for reshape is:
> reshape wide [varlist], i(id) j(date)
> 
> once
> you've done that, you can just generate a new variable that 
> sums across
> the responses (once counting non-missing responses, and once counting
> positive responses).
> 
> after doing that, you can easily reshape the data back to long form:
> reshape long [varlist], i(id) j(date)

Andrew Stocking 

> I have an unbalanced panel of subjects who have been 
> contacted very
> irregularly over the past 5 years. Total contacts range from 
> 40-250 during
> the 5 year period depending on the person.  I'd like to create two
> variables: one that counts the total number of contacts in 
> the last 30 or 60
> days and a second that sums the number of positive responses 
> over the same
> 30 or 60 days.  For each contact there could be anywhere from 
> 0-15 contacts
> in the last 30 days.  
> 
> My data looks like:
> ID    Date    Response    Var1    Var2
> 1    37200        1    1    1
> 1    37210        0    2    1
> 1    37215        1    3    2
> 1    37229        1    4    3
> 1    37231        0    4    2
> 2    37201        0    1    0
> .....
> 
> I can't make egen count() work for me (or really anything else).  

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index