From: "Nick Cox" <firstname.lastname@example.org>
Subject: RE: st: RE: -finddup- for panel?
Date: Wed, 21 Apr 2004 10:49:43 +0100
Elsewhere Fred Wolfe and Joe J. discussed how
to use Fred's -finddup- command to look
An alternative is official Stata's -duplicates-
A typical sequence might run
. duplicates report id year
to see whether there are duplicates w.r.t. -id year-;
. duplicates examples id year
. duplicates list id year
to see what they are; and
. duplicates examples id year y
to see whether there are also ties on -y- for those
duplicates; and so forth.
As in the case of -finddup-, and of other duplicates
commands I can think of, multiple variables may
be specified to -duplicates-.
> Here is what I meant.
> Panel variable: id, time variable: year
> There is a variable y which has missing values and I want to
> use -cipolate-
> --the stata-code available at SSC--to interpolate the missing
> values. I do
> the interpolation the following way.
> tsset id year, yearly
> by id : cipolate y year, gen(yci)
> It does not run because id has some duplicates, which resulted due to
> data-entry errors. Therefore I want to remove duplicates for
> each year and
> do -cipolate- (the cubic interpolation code at ssc) on the
> resulting data
> set with unique ids.
> I remove duplicates the following way for each year.
> use "C:\data75.dta", clear
> finddup id if year==1975, nol k/*finddup is also downloadable
> from ssc*/
> save "C:\data75a.dta", replace
> drop if dupval>=2/*removing duplicates*/
> save "C:\data75b.dta", replace/*data with unique ids*/
> by id : cipolate y year, gen(yci)/*cubic interpolation*/
> save "C:\data75c.dta", replace
> use "C:\data75a.dta", clear
> keep if dupval>=2/*collecting duplicates*/
> save "C:\data75d.dta", replace
> I repeat the above steps for other years and at the end append the
> interpolated and duplicate files for each year.
> use "C:\data75c.dta", clear
> append using "C:\data75d.dta"
> append using "C:\data76c.dta"
> append using "C:\data76d.dta"
> My question is , is there any way of detecting duplicate ids
> for all years
> simaltaneosly instead of doing it for each year sepearately.
> (I wish I could
> do it the following way
> by year: finddup id , nol k).
* For searches and help try: