Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Identify and delete duplicate obs

From	Rongrong Zhang <[email protected]>
To	[email protected]
Subject	st: Identify and delete duplicate obs
Date	Thu, 26 Dec 2013 09:12:00 -0500

Hello,

my dataset has the following structure

industrynumber   naics
1000                     .
1001                      114
100101                  114
100102                   114
........

both variables are string .
the first observation has a missing value for naics. observations
sharing the same four digit (e.g. 1001) will have the same naics, I
would like to keep one observation only for the same naics

i used
bysort naics: gen dup=cond(_N==1, 0, _n)

this command will count missing naics as well, I have thousands of
missing naics records, in which case, dup is a large number.

how should I replace dup value when the observation is missing naics??

this did not work"replace  dup=-1 if naics==" "

thank you,
Rochelle
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: Re: Identify and delete duplicate obs
  - From: "Joseph Coveney" <[email protected]>

Prev by Date: Re: st: Export Excel and Cell references
Next by Date: st: Re: Identify and delete duplicate obs
Previous by thread: st: Stata plugin support for long strings
Next by thread: st: Re: Identify and delete duplicate obs
Index(es):
- Date
- Thread