Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Finding duplicate values across different variables

From	Joe Canner <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: RE: Finding duplicate values across different variables
Date	Mon, 10 Mar 2014 15:24:05 +0000

Michael,

Nick Cox answered a very similar question here last week: http://www.stata.com/statalist/archive/2014-03/msg00067.html

Let us know if you can't get his solution to work or if it doesn't apply.

Regards,
Joe Canner

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Michael Goodwin
Sent: Monday, March 10, 2014 11:04 AM
To: [email protected]
Subject: st: Finding duplicate values across different variables

I have a social network dataset consisting of two ID variables (source and
target) and a number of indicators (ind1, ind2, ind3). The data looks like
this:

source            target            ind1   ind2   ind3
company1      company2      1       0       0
company3      company5      0       1       0
company2      company1      1       1       0
company5      company3      1       1       1

My goal is to 1) consolidate any observations where the combination of
source and target is equal (even where they aren't duplicates in the
traditional Stata sense, such as observations 1 and 3 or 2 and 4 above);
and 2) make the source and target of the consolidated observation equal to
the source and target of whichever observation had a higher rowtotal of the
indicators (so observations 3 and 4 would remain).

Thus far, my approach has been to create a concatenation of source and
target and, in a loop, flag all instances where
source+target==target+source elsewhere in the dataset.

gen orig = source+target;
gen new = target+source;
gen temp = .;
local max = _N;
egen count = rowtotal(ind*);
forv num = 1/`max' {;
replace temp = 1 if orig==new[`num'];
};

I still haven't been able to figure out how to sort the resulting dataset
in such a way that I can easily consolidate the observations based on the
count variable. Any thoughts you have would be much appreciated. Thanks in
advance.

Best,

Mike
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: Finding duplicate values across different variables
  - From: Michael Goodwin <[email protected]>

References:
- st: Finding duplicate values across different variables
  - From: Michael Goodwin <[email protected]>

Prev by Date: Re: st: Error with -round()-
Next by Date: Re: st: Issues with missing values
Previous by thread: st: Finding duplicate values across different variables
Next by thread: Re: st: RE: Finding duplicate values across different variables
Index(es):
- Date
- Thread