# Re: st: Data management problem

 From "Austin Nichols" To statalist@hsphsun2.harvard.edu Subject Re: st: Data management problem Date Tue, 6 May 2008 16:29:55 -0400

```David W. Harless:
Here is another approach (cut and paste to Command window):

clear
tempfile p1 p2
input beg1   end1   p1
15522  15674  1
15887  16617  1
end
ren beg1 date1
ren end1 date2
g id=1
g i=_n
reshape long date, i(i) j(t1)
la def t 1 "begin" 2 "end"
la val t1 t
format date %d
save `p1'
clear
input beg2   end2   p2
15522  15674  1
15979  16436  1
16557  16587  1
end
ren beg2 date1
ren end2 date2
g id=1
g i=_n
reshape long date, i(i) j(t2)
la def t 1 "begin" 2 "end"
la val t2 t
format date %d
save `p2'
joinby id date using `p1', unm(both) update
drop i _m
sort id date
by id: g inp1=(sum(t1==1)>0)-(t1==2)>0
by id: replace inp1=0 if inp1[_n-1]==0&mi(p1)
by id: g inp2=(sum(t2==1)>0)-(t2==2)>0
by id: replace inp2=0 if inp2[_n-1]==0&mi(p2)
bys id (date): g enddate=date[_n+1]
format enddate %d
order id date enddate t1 t2 p1 p2 inp1 inp2
ren date startdate
drop t1 t2 p1 p2
drop if mi(end)
g david=max(inp1,2*inp2)
li, noo clean

> > David W. Harless
> >
> >
> >
> > I have two data sets containing dates of participation in two related
> > programs, program 1
> > and program 2.  But the complication is that the program 1 data set lists
> > dates if the
> > participant is enrolled in *either* program 1 or program 2.  Dates in the
> > program 2 data
> > set indicate definite participation in program 2.
> >
> > The best explanation is an example.  Here is program participation dates
> > from the program
> > 1 data set.  (Date variables have display format %dD_m_Y and I added the
> > program variable
> > to make this explanation clearer):
> >
> >          beg1        end1   program
> >     01 Jul 02   30 Nov 02         1
> >     01 Jul 03   30 Jun 05         1
> >
> > And the same individual for the program 2 data set:
> >
> >          beg2        end2   program
> >     01 Jul 02   30 Nov 02         2
> >     01 Oct 03   31 Dec 04         2
> >     01 May 05   31 May 05         2
> >
> >
> > I want to combine these records to obtain a data set that looks like:
> >
> >           beg         end   program
> >     01 Jul 02   30 Nov 02         2
> >     01 Dec 02   30 Jun 03         0
> >     01 Jul 03   30 Sep 03         1
> >     01 Oct 03   31 Dec 04         2
> >     01 Jan 05   30 Apr 05         1
> >     01 May 05   31 May 05         2
> >     01 Jun 05   30 Jun 05         1
> >
> > (where the 0 indicates the individual did not participate in either program
> > during that
> > period).
> >
> > There are, of course, many individuals with varying dates of participation
> > in one or both
> > programs.  Any suggestions as to how one might solve this problem?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```