[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
n j cox <n.j.cox@durham.ac.uk> |

To |
n j cox <n.j.cox@durham.ac.uk> |

Subject |
Re: st: Unbalanced panel, count number of incidents |

Date |
Wed, 25 Jul 2007 17:49:53 +0100 |

A program is meant to be empowering; you have a new tool to use. However, it can be enfeebling, especially if it does not (seem to) do exactly what you want. Programmers employ various commands and constructs that non-programmers do not usually use, and the creation of a program sometimes seems to produce a strange mixture of fear and respect from those who cannot yet do it themselves. As a program is just more Stata, that is often an exaggerated response. Thus it may help to strip away the program aspect, and reduce this to a bare minimum. You want to count certain kinds of observations. The key insight is that a rather good way to do this is using the -count- command. This is less often done than it should be. For an article making that point ad nauseam you could see Cox, N.J. 2007. Making it count. Stata Journal 7(1) but nothing here depends on reading that. I'm not to going to explain everything. In particular, I am going to assume at least a rough sense of what -forval- does and what local macros are. Other accounts exist introducing those. You can initialise a count variable like this: gen count = . For a problem as messy as yours, resort to brute force is now needed. The messiness comes from the irregularity of times in the data. So we need to loop over the observations looking at each one in turn. The basic count is then count if <some condition is true> & <observation is in the same panel as this one> & <time is within interval of interest relative to this one> This is part Stata, part pseudocode. The parts in < > are pseudocode. The -count- will produce a number in your Results window, but that's less important than the fact that the -count- leaves the result in -r(N)-. We must grab that before it is stomped on by something else, or just disappears. We grab it and use it: replace count = r(N) in <this observation> We want to do this for each observation. Even the thought of doing that manually is painful, but there is a procedure for automating it easily. Suppose you have 4567 observations. Then you can go forval i = 1/4567 { count if <all that stuff> replace count = r(N) in `i' } Now you probably don't have 4567 observations. So, you could just substitute the right number for 4567, or you could think more generally. _N is the number of observations. local N = _N forval i = 1/`N' { count if <all that stuff> replace count = r(N) in `i' } -forval- is a little fussy in its feeding, so we can't use _N directly. The -local- statement defines a local macro -N-. Once it exists, we can use its value by referring to `N'. `i', which we haven't explained yet, is another local macro, which is brought into being by the -forval- loop. Filling in the pseudocode, there are three components. Here are three examples to match: <some condition is true> PosResp == 1 <observation is in the same panel as this one> ID == ID[`i'] <time is within interval of interest relative to this one> inrange(Date[`i'] - Date, 1, 43200) Now we can put it all together. gen count = . local N = _N quietly forval i = 1/`N' { count if PosResp == 1 & /// ID == ID[`i'] & /// inrange(Date[`i'] - Date, 1, 43200) replace count = r(N) in `i' } A new detail here is the -quietly- slapped on the loop to stop a long list of results being shown. That is not essential; indeed, at a debugging stage, seeing a stream of output is useful and reassuring. Experienced programmers usually reduce that by one line: gen count = . quietly forval i = 1/`= _N' { count if PosResp == 1 & /// ID == ID[`i'] & /// inrange(Date[`i'] - Date, 1, 43200) replace count = r(N) in `i' } Why can't we write quietly forval i = 1/`= _N' { count if PosResp == 1 & /// ID == ID[`i'] & /// inrange(Date[`i'] - Date, 1, 43200) gen count = r(N) in `i' } Because this will fail second time round the loop. First time round the loop, all will be fine, but second time the -count- variable already exists, and you can't -generate- it again. This is why we use -replace- in the loop, and why we need to initialise the variable outside and before the loop (because, conversely, you can't -replace- something that doesn't yet exist). What we initialise it to is less important, but setting it to missing is good practice. Two more comments: 1. This is going to be a bit slow. 2. However, the structure can be copied. Sometimes, we want something calculated, the mean systolic blood pressure over measurements in the last 30 days, or whatever. We just need to -summarize-, not -count-. gen meansysbp = . quietly forval i = 1/`= _N' { summarize sysbp if /// ID == ID[`i'] & /// inrange(Date[`i'] - Date, 1, 30), meanonly replace meansysbp = r(mean) in `i' } Nick n.j.cox@durham.ac.uk <various exchanges> Andrew Stocking

> > I have an unbalanced panel of subjects who have been > > contacted very > > irregularly over the past 5 years. Total contacts range from > > 40-250 during > > the 5 year period depending on the person. I'd like to create two > > variables: one that counts the total number of contacts in > > the last 30 or 60 > > days and a second that sums the number of positive responses > > over the same > > 30 or 60 days. For each contact there could be anywhere from > > 0-15 contacts > > in the last 30 days. > > > > My data looks like: > > ID Date Response Var1 Var2 > > 1 37200 1 1 1 > > 1 37210 0 2 1 > > 1 37215 1 3 2 > > 1 37229 1 4 3 > > 1 37231 0 4 2 > > 2 37201 0 1 0 > > .....

* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: RE: st: Unbalanced panel, count number of incidents***From:*n j cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Bar graph: problem with category labels** - Next by Date:
**RE: st: Suggestions for Second Edition of A Visual Guide to Stata Graphics** - Previous by thread:
**Re: RE: st: Unbalanced panel, count number of incidents** - Next by thread:
**st: RV: RE: Sum over all possible combinations** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |