Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Unbalanced panel, count number of incidents


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Unbalanced panel, count number of incidents
Date   Mon, 23 Jul 2007 20:04:29 +0100

Stata is not a time machine. Therefore, any version
of Stata cannot recognise features introduced in a later version. 
You appear to be looking at some Stata 10 documentation
and using Stata 9.2. That can only lead to (occasional) confusion. 

37200 is not a Stata daily date that could be part of your 
data. You refer to Excel, which I understand to be a Microsoft 
program. I know very little about it, but it may be that it
is using a base date that differs from Stata's, possibly 1 January 1900
rather than 1 January 1960. I would be careful, however, 
as it is documented that Excel is incorrect about whether 1900
was a leap year, a case of bug-for-bug compatibility with Lotus 1-2-3. 
Stata is correct on this point. 

Otherwise the most developed (unofficial) solutions for date-time 
manipulations in Stata 9.2 appear to be -ntimeofday- 
and -stimeofday- published in the Stata Journal. 

In Stata 9.2 you can set your date-times with unspecified 
time unit. In your case that might need to be seconds. 
You may need to worry about an appropriate display
format, or just not bother. 

A better bet is to upgrade as soon as possible and make use
of Stata 10's facilities. 

My program doesn't care about time units, so long as 
your data are -tsset-. 

Nick 
n.j.cox@durham.ac.uk 

Andrew Stocking
 
> Thank you for the program - it seems like a great solution!  Two quick
> follow up question.  
> 
> My date variable is currently available down to the second (i.e.,
> -clocktime- to stata, though my 9.2 version of Stata doesn't 
> recognize the
> -clocktime- option).  Right now I have a string that appears:
> 11/5/2001 15:46  
> or from Excel I've converted this to (I imagine stata can do the same,
> though it seems somewhat more complicated):
> 37200.6569
> 
> I don't really care about the second or minutes or hours 
> right now except
> that there are multiple contacts on the same day 
> differentiated only by
> seconds, minutes, or hours.  So, my two questions:
> 1) How do I set the date with hours, minutes, seconds as the 
> time dimension
> of my panel data (-tsset-)?  If I -trunc- off the h,m,s of the
> Excel-converted date, I receive the obvious error that there 
> are "repeated
> time values within panel".  It seems like the -clocktime- 
> format should do
> it, but I get an error regarding clocktime as not recognized:
> -tsset  Sent_Date, clocktime delta(1 day)-
> I've installed -egenmore- and read about the -dhms- function
> 
> 2) What's the best way to deal with this with respect to your 
> -count_recent-
> program below and the fact that I'll have two contacts in the 
> same day?  How
> do I set the lag to a whole day and have the program still 
> accurately count
> totals?
> 
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Sunday, July 22, 2007 2:00 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: RE: st: Unbalanced panel, count number of incidents
> 
> I don't see how Naomi's solution gets at the most 
> difficult part of this problem, which is to take account
> of the irregularity of times observed in counting over
> the last # days. 
> 
> -egen, count()-, or even -egen, total()- which is a better
> bet for similar problems, is more or less useless for this
> kind of problem as the relevant time interval varies. 
> 
> I too am probably missing something simple. However, 
> when all inspiration fails, try brute force. 
> The brute force approach is quite easy to program
> and at worst requires a single loop over the observations. 
> 
> An approach discussed at length in 
> 
> Cox, N.J. 2007. Making it count. Stata Journal 7(1) 
> 
> is to set up a variable, loop over possibilities using
> -count- for each observation, and replace each value of that
> variable by the result. 
> 
> This is built into the following program: 
> 
> *! 1.0.0 NJC 22 July 2007
> program count_recent
> 	version 8 
> 	syntax [if] [in], Lag(numlist int max=1 >0) Generate(str) 
> 
> 	quietly { 
> 		confirm new var `generate' 
> 
> 		marksample touse 
> 		count if `touse' 
> 		if r(N) == 0 error 2000
> 
> 		tsset 
> 		local p "`r(panelvar)'"
> 		local t "`r(timevar)'" 
> 		if "`p'" == "" { 
> 			tempvar p 
> 			gen byte `p' = 1 
> 		}	
> 
> 		gen `generate' = . 
> 
> 		forval i = 1/`=_N' { 
> 			if `touse'[`i'] {
> 				count if `touse' ///
> 				& inrange(`t'[`i'] - `t', 1, `lag') ///
> 				& `p' == `p'[`i'] 
> 				replace `generate' = r(N) in `i'
> 			}
> 		} 
> 	} 
> end
> 
> What we are counting, for each observation, are 
> how many observations are 
> 
> (c) in the same panel (whenever there is panel structure) 
> -- you don't quite say this is what you want, but I guess
> it's true. 
> 
> (b) within 1 to -lag- (compulsory option) time units previous 
> 
> (a) relevant (by default all observations). This is determined 
> by any -if- or -in- conditions. 
> 
> I assume a prior -tsset-. 
> 
> So, examples could be 
> 
> tsset ID Date 
> count_recent , lag(30) generate(prev30)
> count_recent if Response == 1, lag(60) generate(pos_prev60) 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Naomi Levy
>  
> > I am no expert here, and there is likely to be a much
> > easier way to do this than what I am suggesting, but this is what I
> > would do:
> > 
> > I would -reshape- your data from long form to wide
> > form so that each row is an ID and the responses on each day 
> > of contact
> > become separate variables.
> > 
> > The new form would look like this:
> > 
> > ID    Response37200    Var137200    Var237200    
> > Response37210    Var137210      Var237210
> > 1                1                      1                 1   
> >                0                           2                    1
> > 
> > Before
> > you do this I suggest dropping any variables you don't need for this
> > analysis and renaming variables so their names are shorter (e.g.
> > response to r).  Also, if all you are interested in for the 
> > analysis are more recent
> > dates of contact, you can drop all the data for prior dates 
> > of contact.
> > 
> > the syntax for reshape is:
> > reshape wide [varlist], i(id) j(date)
> > 
> > once
> > you've done that, you can just generate a new variable that 
> > sums across
> > the responses (once counting non-missing responses, and 
> once counting
> > positive responses).
> > 
> > after doing that, you can easily reshape the data back to long form:
> > reshape long [varlist], i(id) j(date)
> 
> Andrew Stocking 
> 
> > I have an unbalanced panel of subjects who have been 
> > contacted very
> > irregularly over the past 5 years. Total contacts range from 
> > 40-250 during
> > the 5 year period depending on the person.  I'd like to create two
> > variables: one that counts the total number of contacts in 
> > the last 30 or 60
> > days and a second that sums the number of positive responses 
> > over the same
> > 30 or 60 days.  For each contact there could be anywhere from 
> > 0-15 contacts
> > in the last 30 days.  
> > 
> > My data looks like:
> > ID    Date    Response    Var1    Var2
> > 1    37200        1    1    1
> > 1    37210        0    2    1
> > 1    37215        1    3    2
> > 1    37229        1    4    3
> > 1    37231        0    4    2
> > 2    37201        0    1    0
> > .....
> > 
> > I can't make egen count() work for me (or really anything else).  

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index