Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Unbalanced panel, count number of incidents


From   n j cox <n.j.cox@durham.ac.uk>
To   n j cox <n.j.cox@durham.ac.uk>
Subject   Re: st: Unbalanced panel, count number of incidents
Date   Wed, 25 Jul 2007 17:49:53 +0100

A program is meant to be empowering; you have a new tool
to use. However, it can be enfeebling, especially if it
does not (seem to) do exactly what you want. Programmers
employ various commands and constructs that non-programmers
do not usually use, and the creation of a program sometimes
seems to produce a strange mixture of fear and respect from
those who cannot yet do it themselves. As a program is just
more Stata, that is often an exaggerated response.

Thus it may help to strip away the program aspect, and
reduce this to a bare minimum.

You want to count certain kinds of observations. The key
insight is that a rather good way to do this is using
the -count- command. This is less often done than it should
be. For an article making that point ad nauseam you could see

Cox, N.J. 2007. Making it count. Stata Journal 7(1)

but nothing here depends on reading that. I'm not to
going to explain everything. In particular, I am going
to assume at least a rough sense of what -forval- does
and what local macros are. Other accounts exist introducing
those.

You can initialise a count variable like this:

gen count = .

For a problem as messy as yours, resort to brute force
is now needed. The messiness comes from the irregularity
of times in the data. So we need to loop over the observations
looking at each one in turn. The basic count is then

count if <some condition is true> &
	 <observation is in the same panel as this one> &
	 <time is within interval of interest relative to
		this one>

This is part Stata, part pseudocode. The parts in < >
are pseudocode. The -count- will produce a number in
your Results window, but that's less important than
the fact that the -count- leaves the result in -r(N)-.
We must grab that before it is stomped on by something
else, or just disappears. We grab it and use it:

replace count = r(N) in <this observation>

We want to do this for each observation. Even
the thought of doing that manually is painful, but
there is a procedure for automating it easily. Suppose
you have 4567 observations. Then you can go

forval i = 1/4567 {
	count if <all that stuff>
	replace count = r(N) in `i'
}

Now you probably don't have 4567 observations. So,
you could just substitute the right number for 4567,
or you could think more generally. _N is the number
of observations.

local N = _N
forval i = 1/`N' {
	count if <all that stuff>
	replace count = r(N) in `i'
}

-forval- is a little fussy in its feeding, so we
can't use _N directly.

The -local- statement defines a local macro -N-.
Once it exists, we can use its value by referring to
`N'.

`i', which we haven't explained yet, is another
local macro, which is brought into being by the -forval-
loop.

Filling in the pseudocode, there are three components.
Here are three examples to match:

<some condition is true>

	PosResp == 1

<observation is in the same panel as this one>

	ID == ID[`i']
	
<time is within interval of interest relative to
		this one>

	inrange(Date[`i'] - Date, 1, 43200)

Now we can put it all together.

gen count = .
local N = _N
quietly forval i = 1/`N' {
	count if PosResp == 1 &  ///
		 ID == ID[`i'] & ///
	         inrange(Date[`i'] - Date, 1, 43200)
	replace count = r(N) in `i'
}
			
A new detail here is the -quietly- slapped on the loop
to stop a long list of results being shown. That is not
essential; indeed, at a debugging stage, seeing a stream
of output is useful and reassuring.

Experienced programmers usually reduce that by one line:

gen count = .
quietly forval i = 1/`= _N' {
	count if PosResp == 1 &  ///
		 ID == ID[`i'] & ///
	         inrange(Date[`i'] - Date, 1, 43200)
	replace count = r(N) in `i'
}

Why can't we write

quietly forval i = 1/`= _N' {
	count if PosResp == 1 &  ///
		 ID == ID[`i'] & ///
	         inrange(Date[`i'] - Date, 1, 43200)
	gen count = r(N) in `i'
}

Because this will fail second time round the loop.
First time round the loop, all will be fine, but
second time the -count- variable already exists, and you can't
-generate- it again. This is why we use -replace-
in the loop, and why we need to initialise the
variable outside and before the loop (because, conversely,
you can't -replace- something that doesn't yet
exist). What we initialise it to is less important,
but setting it to missing is good practice.

Two more comments:

1. This is going to be a bit slow.

2. However, the structure can be copied.
Sometimes, we want something calculated,
the mean systolic blood pressure over measurements
in the last 30 days, or whatever. We just need
to -summarize-, not -count-.

gen meansysbp = .
quietly forval i = 1/`= _N' {
	summarize sysbp if  ///
		 ID == ID[`i'] & ///
	         inrange(Date[`i'] - Date, 1, 30), meanonly
	replace meansysbp = r(mean) in `i'
}

Nick
n.j.cox@durham.ac.uk 	

<various exchanges>

Andrew Stocking

 > > I have an unbalanced panel of subjects who have been
 > > contacted very
 > > irregularly over the past 5 years. Total contacts range from
 > > 40-250 during
 > > the 5 year period depending on the person.  I'd like to create two
 > > variables: one that counts the total number of contacts in
 > > the last 30 or 60
 > > days and a second that sums the number of positive responses
 > > over the same
 > > 30 or 60 days.  For each contact there could be anywhere from
 > > 0-15 contacts
 > > in the last 30 days.
 > >
 > > My data looks like:
 > > ID    Date    Response    Var1    Var2
 > > 1    37200        1    1    1
 > > 1    37210        0    2    1
 > > 1    37215        1    3    2
 > > 1    37229        1    4    3
 > > 1    37231        0    4    2
 > > 2    37201        0    1    0
 > > .....
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index