Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Finding matching strings across vars


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Finding matching strings across vars
Date   Thu, 13 Jun 2013 19:05:17 -0400

The estimate is something like:
25 minutes to create a good illustrative test dataset with

clear
input ...
end

then another 10 minutes for the solution.

On Thu, Jun 13, 2013 at 6:54 PM, Steve Nakoneshny <scnakone@ucalgary.ca> wrote:
> Dear Statalist,
>
> A colleague has provided me with an excel file of 3 vars and 43,510 obs of gene names (all strings, all uppercase). Each var represents a different list of genes and he has asked me if there is an "easy" way in Stata to find out if any of the genes listed in var1 also appear in var2 and/or var3. To further complicate matters, the obs in var1 are non-unique and many have multiple alternate gene names like "ANKRD20A13P///ANKRD20A4///ANKRD20A2///ANKRD20A3///ANKRD20A11P///ANKRD20A9P///ANKRD20A1" embedded into the same obs.
>
> In visualising a plan of attack, I'm thinking I need to read in var1, drop duplicates, split the longer obs parsing on "///", reshape long and drop duplicate once again to arrive at a single var list of unique gene names. This next step is where my plan starts to break down. I'm leaning towards appending the excel file again to read in var2 and var3, but then I'm not 100% sure on how to search for matches across each var or how to readily identify them once I do.
>
> Any comments or suggestions would be greatly appreciated.
>
> Steve
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index