Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Finding duplicate values across different variables

From	Joe Canner <[email protected]>
To	"[email protected]" <[email protected]>
Subject	RE: st: RE: Finding duplicate values across different variables
Date	Mon, 10 Mar 2014 23:15:17 +0000

Credit to Nick for the solution; I just have a good memory.  It would have taken me a while to up with that (if at all).

And, yes, it is hard sometimes to formulate a help search in such as way to match your terminology with that used in previous solutions.
________________________________________
From: [email protected] [[email protected]] on behalf of Michael Goodwin [[email protected]]
Sent: Monday, March 10, 2014 6:41 PM
To: [email protected]
Subject: Re: st: RE: Finding duplicate values across different variables

Hi Joe,

Thanks, this is extremely helpful. Sometimes you just have to know how
to ask the right question!

Best,

Mike

On Mon, Mar 10, 2014 at 11:24 AM, Joe Canner <[email protected]> wrote:
> Michael,
>
> Nick Cox answered a very similar question here last week: http://www.stata.com/statalist/archive/2014-03/msg00067.html
>
> Let us know if you can't get his solution to work or if it doesn't apply.
>
> Regards,
> Joe Canner
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf Of Michael Goodwin
> Sent: Monday, March 10, 2014 11:04 AM
> To: [email protected]
> Subject: st: Finding duplicate values across different variables
>
> I have a social network dataset consisting of two ID variables (source and
> target) and a number of indicators (ind1, ind2, ind3). The data looks like
> this:
>
> source            target            ind1   ind2   ind3
> company1      company2      1       0       0
> company3      company5      0       1       0
> company2      company1      1       1       0
> company5      company3      1       1       1
>
> My goal is to 1) consolidate any observations where the combination of
> source and target is equal (even where they aren't duplicates in the
> traditional Stata sense, such as observations 1 and 3 or 2 and 4 above);
> and 2) make the source and target of the consolidated observation equal to
> the source and target of whichever observation had a higher rowtotal of the
> indicators (so observations 3 and 4 would remain).
>
> Thus far, my approach has been to create a concatenation of source and
> target and, in a loop, flag all instances where
> source+target==target+source elsewhere in the dataset.
>
> gen orig = source+target;
> gen new = target+source;
> gen temp = .;
> local max = _N;
> egen count = rowtotal(ind*);
> forv num = 1/`max' {;
> replace temp = 1 if orig==new[`num'];
> };
>
> I still haven't been able to figure out how to sort the resulting dataset
> in such a way that I can easily consolidate the observations based on the
> count variable. Any thoughts you have would be much appreciated. Thanks in
> advance.
>
> Best,
>
> Mike
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/



--
MIKE GOODWIN
Project Leader, Endeavor Insight



900 Broadway, Suite 301
New York, NY 10003
www.endeavor.org
Tel: 646-368-6354
Skype: michael.p.goodwin
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Finding duplicate values across different variables
  - From: Michael Goodwin <[email protected]>
- st: RE: Finding duplicate values across different variables
  - From: Joe Canner <[email protected]>
- Re: st: RE: Finding duplicate values across different variables
  - From: Michael Goodwin <[email protected]>

Prev by Date: Re: st: RE: Duplicate observations
Next by Date: Re: st: problem with negative binomial fixed effect model
Previous by thread: Re: st: RE: Finding duplicate values across different variables
Next by thread: Re: st: Finding duplicate values across different variables
Index(es):
- Date
- Thread