Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: -finddup- for panel?

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: RE: -finddup- for panel?
Date   Wed, 21 Apr 2004 10:49:43 +0100

Elsewhere Fred Wolfe and Joe J. discussed how 
to use Fred's -finddup- command to look 
for duplicates. 

An alternative is official Stata's -duplicates- 

A typical sequence might run 

. duplicates report id year 

to see whether there are duplicates w.r.t. -id year-; 

. duplicates examples id year 


. duplicates list id year 

to see what they are; and  

. duplicates examples id year y 

to see whether there are also ties on -y- for those
duplicates; and so forth. 

As in the case of -finddup-, and of other duplicates
commands I can think of, multiple variables may 
be specified to -duplicates-. 

[email protected] 

joe J.
> Here is what I meant.
> Panel variable: id, time variable: year
> There is a variable y which has missing values and I want to 
> use -cipolate- 
> --the stata-code available at SSC--to interpolate the missing 
> values. I do 
> the interpolation the following way.
> tsset id year, yearly
> by id : cipolate y   year, gen(yci)
> It does not run because id has some duplicates, which resulted due to 
> data-entry errors. Therefore I want to remove duplicates for 
> each year and 
> do -cipolate- (the cubic interpolation code at ssc) on the 
> resulting data 
> set with unique ids.
> I remove duplicates the following way for each year.
> use "C:\data75.dta", clear
> finddup id if year==1975, nol k/*finddup is also downloadable 
> from ssc*/
> save "C:\data75a.dta", replace
> drop if dupval>=2/*removing duplicates*/
> save "C:\data75b.dta", replace/*data with unique ids*/
> by id : cipolate y   year, gen(yci)/*cubic interpolation*/
> save "C:\data75c.dta", replace
> use "C:\data75a.dta", clear
> keep if dupval>=2/*collecting duplicates*/
> save "C:\data75d.dta", replace
> I repeat the above steps for other years and at the end append the 
> interpolated and duplicate files for each year.
> use "C:\data75c.dta", clear
> append using "C:\data75d.dta"
> append using "C:\data76c.dta"
> append using "C:\data76d.dta"
> etc.
> My question is , is there any way of detecting duplicate ids 
> for all years 
> simaltaneosly instead of doing it for each year sepearately. 
> (I wish I could 
> do it the following way
> by year: finddup id , nol k).

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index