# Re: st: Data management problem

 From "David W. Harless" To statalist@hsphsun2.harvard.edu Subject Re: st: Data management problem Date Wed, 07 May 2008 12:55:50 -0400

Austin,

Thanks so much for your solution -- it suits the problem perfectly. The idea of (somehow) using -joinby- had occurred to me, but I made no headway as I was stuck thinking of my data in the original structure (without the -reshape long-).

Again, my great thanks.
Dave Harless

Austin Nichols wrote:

David W. Harless:
Here is another approach (cut and paste to Command window):

clear
tempfile p1 p2
input beg1   end1   p1
15522  15674  1
15887  16617  1
end
ren beg1 date1
ren end1 date2
g id=1
g i=_n
reshape long date, i(i) j(t1)
la def t 1 "begin" 2 "end"
la val t1 t
format date %d
save `p1'
clear
input beg2   end2   p2
15522  15674  1
15979  16436  1
16557  16587  1
end
ren beg2 date1
ren end2 date2
g id=1
g i=_n
reshape long date, i(i) j(t2)
la def t 1 "begin" 2 "end"
la val t2 t
format date %d
save `p2'
joinby id date using `p1', unm(both) update
drop i _m
sort id date
by id: g inp1=(sum(t1==1)>0)-(t1==2)>0
by id: replace inp1=0 if inp1[_n-1]==0&mi(p1)
by id: g inp2=(sum(t2==1)>0)-(t2==2)>0
by id: replace inp2=0 if inp2[_n-1]==0&mi(p2)
bys id (date): g enddate=date[_n+1]
format enddate %d
order id date enddate t1 t2 p1 p2 inp1 inp2
ren date startdate
drop t1 t2 p1 p2
drop if mi(end)
g david=max(inp1,2*inp2)
li, noo clean

David W. Harless

I have two data sets containing dates of participation in two related
programs, program 1
and program 2.  But the complication is that the program 1 data set lists
dates if the
participant is enrolled in *either* program 1 or program 2.  Dates in the
program 2 data
set indicate definite participation in program 2.

The best explanation is an example.  Here is program participation dates
from the program
1 data set.  (Date variables have display format %dD_m_Y and I added the
program variable
to make this explanation clearer):

beg1        end1   program
01 Jul 02   30 Nov 02         1
01 Jul 03   30 Jun 05         1

And the same individual for the program 2 data set:

beg2        end2   program
01 Jul 02   30 Nov 02         2
01 Oct 03   31 Dec 04         2
01 May 05   31 May 05         2

I want to combine these records to obtain a data set that looks like:

beg         end   program
01 Jul 02   30 Nov 02         2
01 Dec 02   30 Jun 03         0
01 Jul 03   30 Sep 03         1
01 Oct 03   31 Dec 04         2
01 Jan 05   30 Apr 05         1
01 May 05   31 May 05         2
01 Jun 05   30 Jun 05         1

(where the 0 indicates the individual did not participate in either program
during that
period).

There are, of course, many individuals with varying dates of participation
in one or both
programs.  Any suggestions as to how one might solve this problem?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/