[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Andrew Stocking" <astocking@earth.care2.com> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Unbalanced panel, count number of incidents |

Date |
Mon, 23 Jul 2007 18:02:32 -0400 |

Thank you Nick. Your -ntimeofday- function worked perfectly. In addition the -count_recent- function worked beautifully for counting up the number of contacts over the last 43200 minutes (I was able to get everything into minutes). So thank you for both. The one thing that doesn't seem to be working quite right is the -count_recent- with the -if()- specification. I type: - count_recent if Response == 1, lag(43200) generate(posResp)- And I get the default -.- for everytime the is Response==0 and a 0 every time Response==1 (this is because often the only response in the last 43200 minutes (or 30 days) is the one where Response==1. In table form, what I should see (as if lag=5 instead of lag=43200): ID Date Response posResp 1 1 0 0 1 5 1 1 1 6 0 1 1 9 1 2 1 11 0 1 1 16 0 0 1 20 1 1 In table form, this is what I do see (as if lag=5 instead of lag=43200): ID Date Response posResp 1 1 0 . 1 5 1 0 1 6 0 . 1 9 1 1 1 11 0 . 1 16 0 . 1 20 1 0 So, -if()- part of the program seems to be 1) ignoring the current positive response, counting everything historically over the lag and 2) doing no calculations for any rows where the Response was 0. I can easily add 1 to everything and then replace the missing values with 0 to account for the first issue. But I'm not sure how to modify the program to account for the second. Maybe this is how it was intended to run. My stata programming ability ranges somewhere between weak and non-existent or I'd try to modify the program myself. Any suggestions? Thanks, Andy -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Monday, July 23, 2007 3:04 PM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Unbalanced panel, count number of incidents Stata is not a time machine. Therefore, any version of Stata cannot recognise features introduced in a later version. You appear to be looking at some Stata 10 documentation and using Stata 9.2. That can only lead to (occasional) confusion. 37200 is not a Stata daily date that could be part of your data. You refer to Excel, which I understand to be a Microsoft program. I know very little about it, but it may be that it is using a base date that differs from Stata's, possibly 1 January 1900 rather than 1 January 1960. I would be careful, however, as it is documented that Excel is incorrect about whether 1900 was a leap year, a case of bug-for-bug compatibility with Lotus 1-2-3. Stata is correct on this point. Otherwise the most developed (unofficial) solutions for date-time manipulations in Stata 9.2 appear to be -ntimeofday- and -stimeofday- published in the Stata Journal. In Stata 9.2 you can set your date-times with unspecified time unit. In your case that might need to be seconds. You may need to worry about an appropriate display format, or just not bother. A better bet is to upgrade as soon as possible and make use of Stata 10's facilities. My program doesn't care about time units, so long as your data are -tsset-. Nick n.j.cox@durham.ac.uk Andrew Stocking > Thank you for the program - it seems like a great solution! Two quick > follow up question. > > My date variable is currently available down to the second (i.e., > -clocktime- to stata, though my 9.2 version of Stata doesn't > recognize the > -clocktime- option). Right now I have a string that appears: > 11/5/2001 15:46 > or from Excel I've converted this to (I imagine stata can do the same, > though it seems somewhat more complicated): > 37200.6569 > > I don't really care about the second or minutes or hours > right now except > that there are multiple contacts on the same day > differentiated only by > seconds, minutes, or hours. So, my two questions: > 1) How do I set the date with hours, minutes, seconds as the > time dimension > of my panel data (-tsset-)? If I -trunc- off the h,m,s of the > Excel-converted date, I receive the obvious error that there > are "repeated > time values within panel". It seems like the -clocktime- > format should do > it, but I get an error regarding clocktime as not recognized: > -tsset Sent_Date, clocktime delta(1 day)- > I've installed -egenmore- and read about the -dhms- function > > 2) What's the best way to deal with this with respect to your > -count_recent- > program below and the fact that I'll have two contacts in the > same day? How > do I set the lag to a whole day and have the program still > accurately count > totals? > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox > Sent: Sunday, July 22, 2007 2:00 PM > To: statalist@hsphsun2.harvard.edu > Subject: RE: st: Unbalanced panel, count number of incidents > > I don't see how Naomi's solution gets at the most > difficult part of this problem, which is to take account > of the irregularity of times observed in counting over > the last # days. > > -egen, count()-, or even -egen, total()- which is a better > bet for similar problems, is more or less useless for this > kind of problem as the relevant time interval varies. > > I too am probably missing something simple. However, > when all inspiration fails, try brute force. > The brute force approach is quite easy to program > and at worst requires a single loop over the observations. > > An approach discussed at length in > > Cox, N.J. 2007. Making it count. Stata Journal 7(1) > > is to set up a variable, loop over possibilities using > -count- for each observation, and replace each value of that > variable by the result. > > This is built into the following program: > > *! 1.0.0 NJC 22 July 2007 > program count_recent > version 8 > syntax [if] [in], Lag(numlist int max=1 >0) Generate(str) > > quietly { > confirm new var `generate' > > marksample touse > count if `touse' > if r(N) == 0 error 2000 > > tsset > local p "`r(panelvar)'" > local t "`r(timevar)'" > if "`p'" == "" { > tempvar p > gen byte `p' = 1 > } > > gen `generate' = . > > forval i = 1/`=_N' { > if `touse'[`i'] { > count if `touse' /// > & inrange(`t'[`i'] - `t', 1, `lag') /// > & `p' == `p'[`i'] > replace `generate' = r(N) in `i' > } > } > } > end > > What we are counting, for each observation, are > how many observations are > > (c) in the same panel (whenever there is panel structure) > -- you don't quite say this is what you want, but I guess > it's true. > > (b) within 1 to -lag- (compulsory option) time units previous > > (a) relevant (by default all observations). This is determined > by any -if- or -in- conditions. > > I assume a prior -tsset-. > > So, examples could be > > tsset ID Date > count_recent , lag(30) generate(prev30) > count_recent if Response == 1, lag(60) generate(pos_prev60) > > Nick > n.j.cox@durham.ac.uk > > Naomi Levy > > > I am no expert here, and there is likely to be a much > > easier way to do this than what I am suggesting, but this is what I > > would do: > > > > I would -reshape- your data from long form to wide > > form so that each row is an ID and the responses on each day > > of contact > > become separate variables. > > > > The new form would look like this: > > > > ID Response37200 Var137200 Var237200 > > Response37210 Var137210 Var237210 > > 1 1 1 1 > > 0 2 1 > > > > Before > > you do this I suggest dropping any variables you don't need for this > > analysis and renaming variables so their names are shorter (e.g. > > response to r). Also, if all you are interested in for the > > analysis are more recent > > dates of contact, you can drop all the data for prior dates > > of contact. > > > > the syntax for reshape is: > > reshape wide [varlist], i(id) j(date) > > > > once > > you've done that, you can just generate a new variable that > > sums across > > the responses (once counting non-missing responses, and > once counting > > positive responses). > > > > after doing that, you can easily reshape the data back to long form: > > reshape long [varlist], i(id) j(date) > > Andrew Stocking > > > I have an unbalanced panel of subjects who have been > > contacted very > > irregularly over the past 5 years. Total contacts range from > > 40-250 during > > the 5 year period depending on the person. I'd like to create two > > variables: one that counts the total number of contacts in > > the last 30 or 60 > > days and a second that sums the number of positive responses > > over the same > > 30 or 60 days. For each contact there could be anywhere from > > 0-15 contacts > > in the last 30 days. > > > > My data looks like: > > ID Date Response Var1 Var2 > > 1 37200 1 1 1 > > 1 37210 0 2 1 > > 1 37215 1 3 2 > > 1 37229 1 4 3 > > 1 37231 0 4 2 > > 2 37201 0 1 0 > > ..... > > > > I can't make egen count() work for me (or really anything else). * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: Unbalanced panel, count number of incidents***From:*"Andrew Stocking" <astocking@earth.care2.com>

**RE: st: Unbalanced panel, count number of incidents***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: patterns of missing data by interviewers** - Next by Date:
**SPAM (13.3) Don't waste time to visit local pills store** - Previous by thread:
**RE: st: Unbalanced panel, count number of incidents** - Next by thread:
**Re: st: Unbalanced panel, count number of incidents** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |