[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Unique Case ID in Large Panel
i can't see what the problem is, but,
as in another thread, -duplicates- offers
a way to tackle this. You don't need
to write your own code, good exercise
though that is.
Stephen V. Burks
> My problem is that -xtdes- says my two key variables do not together
> uniquely identify the cases in my large panel data set, but
> both the Stata
> code I wrote to identify the duplicates, and SPSS's
> automatically generated
> syntax to "find duplicates" (operating on a precursor data
> set with the
> same cases and variables) say there are no duplicates.
> Searching the FAQs on "xtdes" produced no hits.
> 1) I have 1,322,511 cases. The data set is about 700 MB. Memory
> allocated is 1.4GB (and wasn't it a kick in the fatoozle to
> discover that's
> all WinXP will let me allocate from the 4 GB on my system!).
> 2) I first did -tsset- using DRVNUM (employee id) and CDATE
> (paycheck date).
> 3) -xtdes- message:
> DRVNUM: 10001, 10003, ..., 99051 n
> = 28946
> CDATE: 15344, 15351, ..., 16072 T
> = 105
> Delta(CDATE) = 7; (16072-15344)/7 + 1 = 105
> (DRVNUM*CDATE does not uniquely identify observations)
> 4) Stata Code run on data to find duplicates:
> . gsort +DRVNUM +CDATE, generate(DriverWeek)
> . generate DupDriverWeekFlag=0
> . replace DupDriverWeekFlag = 1 if ( DRVNUM == DRVNUM[_n-1] &
> CDATE ==
> CDATE[_n-1] )
> (0 real changes made)
> 5) -summarize- run on DupDriverWeekFlag says min and max are both 0.
* For searches and help try: