Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Joe Canner <jcanner1@jhmi.edu> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: RE: Finding duplicate values across different variables |
Date | Mon, 10 Mar 2014 23:15:17 +0000 |
Credit to Nick for the solution; I just have a good memory. It would have taken me a while to up with that (if at all). And, yes, it is hard sometimes to formulate a help search in such as way to match your terminology with that used in previous solutions. ________________________________________ From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] on behalf of Michael Goodwin [michael.goodwin@endeavor.org] Sent: Monday, March 10, 2014 6:41 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: RE: Finding duplicate values across different variables Hi Joe, Thanks, this is extremely helpful. Sometimes you just have to know how to ask the right question! Best, Mike On Mon, Mar 10, 2014 at 11:24 AM, Joe Canner <jcanner1@jhmi.edu> wrote: > Michael, > > Nick Cox answered a very similar question here last week: http://www.stata.com/statalist/archive/2014-03/msg00067.html > > Let us know if you can't get his solution to work or if it doesn't apply. > > Regards, > Joe Canner > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Michael Goodwin > Sent: Monday, March 10, 2014 11:04 AM > To: statalist@hsphsun2.harvard.edu > Subject: st: Finding duplicate values across different variables > > I have a social network dataset consisting of two ID variables (source and > target) and a number of indicators (ind1, ind2, ind3). The data looks like > this: > > source target ind1 ind2 ind3 > company1 company2 1 0 0 > company3 company5 0 1 0 > company2 company1 1 1 0 > company5 company3 1 1 1 > > My goal is to 1) consolidate any observations where the combination of > source and target is equal (even where they aren't duplicates in the > traditional Stata sense, such as observations 1 and 3 or 2 and 4 above); > and 2) make the source and target of the consolidated observation equal to > the source and target of whichever observation had a higher rowtotal of the > indicators (so observations 3 and 4 would remain). > > Thus far, my approach has been to create a concatenation of source and > target and, in a loop, flag all instances where > source+target==target+source elsewhere in the dataset. > > gen orig = source+target; > gen new = target+source; > gen temp = .; > local max = _N; > egen count = rowtotal(ind*); > forv num = 1/`max' {; > replace temp = 1 if orig==new[`num']; > }; > > I still haven't been able to figure out how to sort the resulting dataset > in such a way that I can easily consolidate the observations based on the > count variable. Any thoughts you have would be much appreciated. Thanks in > advance. > > Best, > > Mike > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- MIKE GOODWIN Project Leader, Endeavor Insight 900 Broadway, Suite 301 New York, NY 10003 www.endeavor.org Tel: 646-368-6354 Skype: michael.p.goodwin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/