Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Finding duplicate values across different variables


From   Michael Goodwin <[email protected]>
To   [email protected]
Subject   Re: st: RE: Finding duplicate values across different variables
Date   Mon, 10 Mar 2014 18:41:53 -0400

Hi Joe,

Thanks, this is extremely helpful. Sometimes you just have to know how
to ask the right question!

Best,

Mike

On Mon, Mar 10, 2014 at 11:24 AM, Joe Canner <[email protected]> wrote:
> Michael,
>
> Nick Cox answered a very similar question here last week: http://www.stata.com/statalist/archive/2014-03/msg00067.html
>
> Let us know if you can't get his solution to work or if it doesn't apply.
>
> Regards,
> Joe Canner
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Michael Goodwin
> Sent: Monday, March 10, 2014 11:04 AM
> To: [email protected]
> Subject: st: Finding duplicate values across different variables
>
> I have a social network dataset consisting of two ID variables (source and
> target) and a number of indicators (ind1, ind2, ind3). The data looks like
> this:
>
> source            target            ind1   ind2   ind3
> company1      company2      1       0       0
> company3      company5      0       1       0
> company2      company1      1       1       0
> company5      company3      1       1       1
>
> My goal is to 1) consolidate any observations where the combination of
> source and target is equal (even where they aren't duplicates in the
> traditional Stata sense, such as observations 1 and 3 or 2 and 4 above);
> and 2) make the source and target of the consolidated observation equal to
> the source and target of whichever observation had a higher rowtotal of the
> indicators (so observations 3 and 4 would remain).
>
> Thus far, my approach has been to create a concatenation of source and
> target and, in a loop, flag all instances where
> source+target==target+source elsewhere in the dataset.
>
> gen orig = source+target;
> gen new = target+source;
> gen temp = .;
> local max = _N;
> egen count = rowtotal(ind*);
> forv num = 1/`max' {;
> replace temp = 1 if orig==new[`num'];
> };
>
> I still haven't been able to figure out how to sort the resulting dataset
> in such a way that I can easily consolidate the observations based on the
> count variable. Any thoughts you have would be much appreciated. Thanks in
> advance.
>
> Best,
>
> Mike
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



-- 
MIKE GOODWIN
Project Leader, Endeavor Insight



900 Broadway, Suite 301
New York, NY 10003
www.endeavor.org
Tel: 646-368-6354
Skype: michael.p.goodwin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index