Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: alternate data management strategies for importing Excel matrices


From   Steven Nakoneshny <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: alternate data management strategies for importing Excel matrices
Date   Tue, 17 Dec 2013 09:31:18 -0700

Hi Sergiy,

The -stack- command is new to me. Thanks for pointing it out. A direct command will obviously be faster than a loop, but I benchmarked them for a tangible metric. Your code using -stack- executed in 2.4420 seconds. My loop executed in 9.0640 seconds. A fourfold speed improvement is nothing to sneeze at.

As for the unusual quotation marks in my original post surrounding “kidney”, those seem to be an artifact created by my mail client when I copy/pasted my code. The same key was depressed for all quotes in that line of code and it is consistent in the do file.

Thanks,
Steve

On Dec 16, 2013, at 6:23 PM, Sergiy Radyakin <[email protected]> wrote:

> Dear Steve,
> 
> 1) see if the -stack- command is useful in your case (standard Stata
> command). As in e.g.:
> do http://radyakin.org/statalist/2013121601/stackvars.do
> 
> 2) I've noticed in your post 'kidney' is in different quotation marks
> then the other words. Be careful, as it can be difficult to find
> errors like the following:
> do http://radyakin.org/statalist/2013121601/diff_q.do
> 
> Best, Sergiy.
> 
> On Mon, Dec 16, 2013 at 6:05 PM, Steven Nakoneshny <[email protected]> wrote:
>> Dear Statalist,
>> 
>> A colleague provided me with an Excel file with two tabs each containing a matrix of de-identified IDs. I wish to convert these matrices into a single var of unique IDs as I will need to -merge- them with patient data shortly. My initial attempts were to use -reshape- but I couldn’t get past the r(498) error "variable _j contains all missing values”.
>> 
>> However, I was able to achieve my desired end result by looping over individual columns in the spreadsheet and appending the results together. Here is my (successful) code:
>> 
>> — code begins —
>> 
>> tempfile blank
>> g accnum=""
>> save `blank'
>> clear
>> 
>> foreach sheet in "large tumor TMA" "#2 TMA" {
>>        foreach x in B C D E F G H I J K L M N O P Q R S T U V W {
>>                import excel using “foo",sh("`sheet'") cellra(`x'4:`x'21)
>>                keep `x'
>>                duplicates drop `x',force
>>                rename `x' accnum
>>                append using `blank'
>>                save `blank',replace
>>                clear
>>        }
>> }
>> 
>> use `blank'
>> duplicates drop accnum,force
>> drop if inlist(accnum, "tonsil", "placenta", "pancreas", "no core", "liver", “kidney”, “")
>> 
>> — code ends —
>> 
>> As with anything else Stata, I thought this could be a tremendous learning opportunity if anybody could suggest other commands by which I could arrive at the same result.
>> 
>> Steve
>> 
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
> 



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index