Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Unbalanced panel, count number of incidents


From   "Andrew Stocking" <astocking@earth.care2.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Unbalanced panel, count number of incidents
Date   Mon, 23 Jul 2007 18:02:32 -0400

Thank you Nick.  Your -ntimeofday- function worked perfectly.  In addition
the -count_recent- function worked beautifully for counting up the number of
contacts over the last 43200 minutes (I was able to get everything into
minutes).  So thank you for both.

The one thing that doesn't seem to be working quite right is the
-count_recent- with the -if()- specification.  I type:
- count_recent if Response == 1, lag(43200) generate(posResp)-

And I get the default -.- for everytime the is Response==0 and a 0 every
time Response==1 (this is because often the only response in the last 43200
minutes (or 30 days) is the one where Response==1.  In table form, what I
should see (as if lag=5 instead of lag=43200):
ID	Date Response	posResp
1	1	0		0
1	5	1		1
1	6	0		1
1	9	1		2
1	11	0		1
1	16	0		0
1	20	1		1

In table form, this is what I do see (as if lag=5 instead of lag=43200):
ID	Date Response	posResp
1	1	0		.
1	5	1		0
1	6	0		.
1	9	1		1
1	11	0		.
1	16	0		.
1	20	1		0

So, -if()- part of the program seems to be 1) ignoring the current positive
response, counting everything historically over the lag and 2) doing no
calculations for any rows where the Response was 0.  I can easily add 1 to
everything and then replace the missing values with 0 to account for the
first issue.  But I'm not sure how to modify the program to account for the
second.  Maybe this is how it was intended to run.  My stata programming
ability ranges somewhere between weak and non-existent or I'd try to modify
the program myself.

Any suggestions?

Thanks,
Andy


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: Monday, July 23, 2007 3:04 PM
To: statalist@hsphsun2.harvard.edu
Subject: RE: st: Unbalanced panel, count number of incidents

Stata is not a time machine. Therefore, any version
of Stata cannot recognise features introduced in a later version. 
You appear to be looking at some Stata 10 documentation
and using Stata 9.2. That can only lead to (occasional) confusion. 

37200 is not a Stata daily date that could be part of your 
data. You refer to Excel, which I understand to be a Microsoft 
program. I know very little about it, but it may be that it
is using a base date that differs from Stata's, possibly 1 January 1900
rather than 1 January 1960. I would be careful, however, 
as it is documented that Excel is incorrect about whether 1900
was a leap year, a case of bug-for-bug compatibility with Lotus 1-2-3. 
Stata is correct on this point. 

Otherwise the most developed (unofficial) solutions for date-time 
manipulations in Stata 9.2 appear to be -ntimeofday- 
and -stimeofday- published in the Stata Journal. 

In Stata 9.2 you can set your date-times with unspecified 
time unit. In your case that might need to be seconds. 
You may need to worry about an appropriate display
format, or just not bother. 

A better bet is to upgrade as soon as possible and make use
of Stata 10's facilities. 

My program doesn't care about time units, so long as 
your data are -tsset-. 

Nick 
n.j.cox@durham.ac.uk 

Andrew Stocking
 
> Thank you for the program - it seems like a great solution!  Two quick
> follow up question.  
> 
> My date variable is currently available down to the second (i.e.,
> -clocktime- to stata, though my 9.2 version of Stata doesn't 
> recognize the
> -clocktime- option).  Right now I have a string that appears:
> 11/5/2001 15:46  
> or from Excel I've converted this to (I imagine stata can do the same,
> though it seems somewhat more complicated):
> 37200.6569
> 
> I don't really care about the second or minutes or hours 
> right now except
> that there are multiple contacts on the same day 
> differentiated only by
> seconds, minutes, or hours.  So, my two questions:
> 1) How do I set the date with hours, minutes, seconds as the 
> time dimension
> of my panel data (-tsset-)?  If I -trunc- off the h,m,s of the
> Excel-converted date, I receive the obvious error that there 
> are "repeated
> time values within panel".  It seems like the -clocktime- 
> format should do
> it, but I get an error regarding clocktime as not recognized:
> -tsset  Sent_Date, clocktime delta(1 day)-
> I've installed -egenmore- and read about the -dhms- function
> 
> 2) What's the best way to deal with this with respect to your 
> -count_recent-
> program below and the fact that I'll have two contacts in the 
> same day?  How
> do I set the lag to a whole day and have the program still 
> accurately count
> totals?
> 
> 
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
> Sent: Sunday, July 22, 2007 2:00 PM
> To: statalist@hsphsun2.harvard.edu
> Subject: RE: st: Unbalanced panel, count number of incidents
> 
> I don't see how Naomi's solution gets at the most 
> difficult part of this problem, which is to take account
> of the irregularity of times observed in counting over
> the last # days. 
> 
> -egen, count()-, or even -egen, total()- which is a better
> bet for similar problems, is more or less useless for this
> kind of problem as the relevant time interval varies. 
> 
> I too am probably missing something simple. However, 
> when all inspiration fails, try brute force. 
> The brute force approach is quite easy to program
> and at worst requires a single loop over the observations. 
> 
> An approach discussed at length in 
> 
> Cox, N.J. 2007. Making it count. Stata Journal 7(1) 
> 
> is to set up a variable, loop over possibilities using
> -count- for each observation, and replace each value of that
> variable by the result. 
> 
> This is built into the following program: 
> 
> *! 1.0.0 NJC 22 July 2007
> program count_recent
> 	version 8 
> 	syntax [if] [in], Lag(numlist int max=1 >0) Generate(str) 
> 
> 	quietly { 
> 		confirm new var `generate' 
> 
> 		marksample touse 
> 		count if `touse' 
> 		if r(N) == 0 error 2000
> 
> 		tsset 
> 		local p "`r(panelvar)'"
> 		local t "`r(timevar)'" 
> 		if "`p'" == "" { 
> 			tempvar p 
> 			gen byte `p' = 1 
> 		}	
> 
> 		gen `generate' = . 
> 
> 		forval i = 1/`=_N' { 
> 			if `touse'[`i'] {
> 				count if `touse' ///
> 				& inrange(`t'[`i'] - `t', 1, `lag') ///
> 				& `p' == `p'[`i'] 
> 				replace `generate' = r(N) in `i'
> 			}
> 		} 
> 	} 
> end
> 
> What we are counting, for each observation, are 
> how many observations are 
> 
> (c) in the same panel (whenever there is panel structure) 
> -- you don't quite say this is what you want, but I guess
> it's true. 
> 
> (b) within 1 to -lag- (compulsory option) time units previous 
> 
> (a) relevant (by default all observations). This is determined 
> by any -if- or -in- conditions. 
> 
> I assume a prior -tsset-. 
> 
> So, examples could be 
> 
> tsset ID Date 
> count_recent , lag(30) generate(prev30)
> count_recent if Response == 1, lag(60) generate(pos_prev60) 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Naomi Levy
>  
> > I am no expert here, and there is likely to be a much
> > easier way to do this than what I am suggesting, but this is what I
> > would do:
> > 
> > I would -reshape- your data from long form to wide
> > form so that each row is an ID and the responses on each day 
> > of contact
> > become separate variables.
> > 
> > The new form would look like this:
> > 
> > ID    Response37200    Var137200    Var237200    
> > Response37210    Var137210      Var237210
> > 1                1                      1                 1   
> >                0                           2                    1
> > 
> > Before
> > you do this I suggest dropping any variables you don't need for this
> > analysis and renaming variables so their names are shorter (e.g.
> > response to r).  Also, if all you are interested in for the 
> > analysis are more recent
> > dates of contact, you can drop all the data for prior dates 
> > of contact.
> > 
> > the syntax for reshape is:
> > reshape wide [varlist], i(id) j(date)
> > 
> > once
> > you've done that, you can just generate a new variable that 
> > sums across
> > the responses (once counting non-missing responses, and 
> once counting
> > positive responses).
> > 
> > after doing that, you can easily reshape the data back to long form:
> > reshape long [varlist], i(id) j(date)
> 
> Andrew Stocking 
> 
> > I have an unbalanced panel of subjects who have been 
> > contacted very
> > irregularly over the past 5 years. Total contacts range from 
> > 40-250 during
> > the 5 year period depending on the person.  I'd like to create two
> > variables: one that counts the total number of contacts in 
> > the last 30 or 60
> > days and a second that sums the number of positive responses 
> > over the same
> > 30 or 60 days.  For each contact there could be anywhere from 
> > 0-15 contacts
> > in the last 30 days.  
> > 
> > My data looks like:
> > ID    Date    Response    Var1    Var2
> > 1    37200        1    1    1
> > 1    37210        0    2    1
> > 1    37215        1    3    2
> > 1    37229        1    4    3
> > 1    37231        0    4    2
> > 2    37201        0    1    0
> > .....
> > 
> > I can't make egen count() work for me (or really anything else).  

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index