Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: -finddup- for panel?


From   "joe J." <[email protected]>
To   [email protected]
Subject   RE: st: RE: -finddup- for panel?
Date   Wed, 21 Apr 2004 10:07:49 +0000

Stata's official -duplicates- command also helps to identify duplicate observations. But I have a feeling that -finddup- is useful when one has to decide over which among the duplicates to include and which to exclde (for late use, say) while generating a dupliate-free data set.


From: "Nick Cox" <[email protected]>
Reply-To: [email protected]
To: <[email protected]>
Subject: RE: st: RE: -finddup- for panel?
Date: Wed, 21 Apr 2004 10:49:43 +0100

Elsewhere Fred Wolfe and Joe J. discussed how
to use Fred's -finddup- command to look
for duplicates.

An alternative is official Stata's -duplicates-
command.

A typical sequence might run

. duplicates report id year

to see whether there are duplicates w.r.t. -id year-;

. duplicates examples id year

or

. duplicates list id year

to see what they are; and

. duplicates examples id year y

to see whether there are also ties on -y- for those
duplicates; and so forth.

As in the case of -finddup-, and of other duplicates
commands I can think of, multiple variables may
be specified to -duplicates-.

Nick
[email protected]

joe J.
>
> Here is what I meant.
>
> Panel variable: id, time variable: year
>
> There is a variable y which has missing values and I want to
> use -cipolate-
> --the stata-code available at SSC--to interpolate the missing
> values. I do
> the interpolation the following way.
>
> tsset id year, yearly
> by id : cipolate y   year, gen(yci)
>
> It does not run because id has some duplicates, which resulted due to
> data-entry errors. Therefore I want to remove duplicates for
> each year and
> do -cipolate- (the cubic interpolation code at ssc) on the
> resulting data
> set with unique ids.
>
> I remove duplicates the following way for each year.
>
> use "C:\data75.dta", clear
> finddup id if year==1975, nol k/*finddup is also downloadable
> from ssc*/
> save "C:\data75a.dta", replace
>
> drop if dupval>=2/*removing duplicates*/
> save "C:\data75b.dta", replace/*data with unique ids*/
>
> by id : cipolate y   year, gen(yci)/*cubic interpolation*/
> save "C:\data75c.dta", replace
>
>
> use "C:\data75a.dta", clear
> keep if dupval>=2/*collecting duplicates*/
> save "C:\data75d.dta", replace
>
> I repeat the above steps for other years and at the end append the
> interpolated and duplicate files for each year.
>
> use "C:\data75c.dta", clear
> append using "C:\data75d.dta"
> append using "C:\data76c.dta"
> append using "C:\data76d.dta"
> etc.
> My question is , is there any way of detecting duplicate ids
> for all years
> simaltaneosly instead of doing it for each year sepearately.
> (I wish I could
> do it the following way
> by year: finddup id , nol k).

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
_________________________________________________________________
Contact brides & grooms FREE! http://www.shaadi.com/ptnr.php?ptnr=hmltag Only on www.shaadi.com. Register now!

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index