[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Unbalanced panel, count number of incidents

From   "Andrew Stocking" <>
To   <>
Subject   RE: st: Unbalanced panel, count number of incidents
Date   Mon, 23 Jul 2007 14:34:59 -0400


Thank you for the program - it seems like a great solution!  Two quick
follow up question.  

My date variable is currently available down to the second (i.e.,
-clocktime- to stata, though my 9.2 version of Stata doesn't recognize the
-clocktime- option).  Right now I have a string that appears:
11/5/2001 15:46  
or from Excel I've converted this to (I imagine stata can do the same,
though it seems somewhat more complicated):

I don't really care about the second or minutes or hours right now except
that there are multiple contacts on the same day differentiated only by
seconds, minutes, or hours.  So, my two questions:
1) How do I set the date with hours, minutes, seconds as the time dimension
of my panel data (-tsset-)?  If I -trunc- off the h,m,s of the
Excel-converted date, I receive the obvious error that there are "repeated
time values within panel".  It seems like the -clocktime- format should do
it, but I get an error regarding clocktime as not recognized:
-tsset  Sent_Date, clocktime delta(1 day)-
I've installed -egenmore- and read about the -dhms- function

2) What's the best way to deal with this with respect to your -count_recent-
program below and the fact that I'll have two contacts in the same day?  How
do I set the lag to a whole day and have the program still accurately count


-----Original Message-----
[] On Behalf Of Nick Cox
Sent: Sunday, July 22, 2007 2:00 PM
Subject: RE: st: Unbalanced panel, count number of incidents

I don't see how Naomi's solution gets at the most 
difficult part of this problem, which is to take account
of the irregularity of times observed in counting over
the last # days. 

-egen, count()-, or even -egen, total()- which is a better
bet for similar problems, is more or less useless for this
kind of problem as the relevant time interval varies. 

I too am probably missing something simple. However, 
when all inspiration fails, try brute force. 
The brute force approach is quite easy to program
and at worst requires a single loop over the observations. 

An approach discussed at length in 

Cox, N.J. 2007. Making it count. Stata Journal 7(1) 

is to set up a variable, loop over possibilities using
-count- for each observation, and replace each value of that
variable by the result. 

This is built into the following program: 

*! 1.0.0 NJC 22 July 2007
program count_recent
	version 8 
	syntax [if] [in], Lag(numlist int max=1 >0) Generate(str) 

	quietly { 
		confirm new var `generate' 

		marksample touse 
		count if `touse' 
		if r(N) == 0 error 2000

		local p "`r(panelvar)'"
		local t "`r(timevar)'" 
		if "`p'" == "" { 
			tempvar p 
			gen byte `p' = 1 

		gen `generate' = . 

		forval i = 1/`=_N' { 
			if `touse'[`i'] {
				count if `touse' ///
				& inrange(`t'[`i'] - `t', 1, `lag') ///
				& `p' == `p'[`i'] 
				replace `generate' = r(N) in `i'

What we are counting, for each observation, are 
how many observations are 

(c) in the same panel (whenever there is panel structure) 
-- you don't quite say this is what you want, but I guess
it's true. 

(b) within 1 to -lag- (compulsory option) time units previous 

(a) relevant (by default all observations). This is determined 
by any -if- or -in- conditions. 

I assume a prior -tsset-. 

So, examples could be 

tsset ID Date 
count_recent , lag(30) generate(prev30)
count_recent if Response == 1, lag(60) generate(pos_prev60) 


Naomi Levy
> I am no expert here, and there is likely to be a much
> easier way to do this than what I am suggesting, but this is what I
> would do:
> I would -reshape- your data from long form to wide
> form so that each row is an ID and the responses on each day 
> of contact
> become separate variables.
> The new form would look like this:
> ID    Response37200    Var137200    Var237200    
> Response37210    Var137210      Var237210
> 1                1                      1                 1   
>                0                           2                    1
> Before
> you do this I suggest dropping any variables you don't need for this
> analysis and renaming variables so their names are shorter (e.g.
> response to r).  Also, if all you are interested in for the 
> analysis are more recent
> dates of contact, you can drop all the data for prior dates 
> of contact.
> the syntax for reshape is:
> reshape wide [varlist], i(id) j(date)
> once
> you've done that, you can just generate a new variable that 
> sums across
> the responses (once counting non-missing responses, and once counting
> positive responses).
> after doing that, you can easily reshape the data back to long form:
> reshape long [varlist], i(id) j(date)

Andrew Stocking 

> I have an unbalanced panel of subjects who have been 
> contacted very
> irregularly over the past 5 years. Total contacts range from 
> 40-250 during
> the 5 year period depending on the person.  I'd like to create two
> variables: one that counts the total number of contacts in 
> the last 30 or 60
> days and a second that sums the number of positive responses 
> over the same
> 30 or 60 days.  For each contact there could be anywhere from 
> 0-15 contacts
> in the last 30 days.  
> My data looks like:
> ID    Date    Response    Var1    Var2
> 1    37200        1    1    1
> 1    37210        0    2    1
> 1    37215        1    3    2
> 1    37229        1    4    3
> 1    37231        0    4    2
> 2    37201        0    1    0
> .....
> I can't make egen count() work for me (or really anything else).  

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index