Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: How can I get the second last non-missing value?


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: How can I get the second last non-missing value?
Date   Thu, 13 Jun 2013 11:15:46 -0400

Rebecca,

the code should be rather simple since it corresponds exactly to the
question task posted. The idea is that we work with each row (obs)
separately - that is loop by i. Within each row we start moving
backwards from the last variable towards the beginning until we hit
first not missing value. That's loop by j. It does include the last
variable (cols(v)-0=cols(v)). From this point on we need to move to
the next value left and proceed until we find the next non-missing
value, or hit the wall (this is loop by k). If we find it we post it
into the result, otherwise it remains empty. The purpose of the view
is really not to save the memory, but to be able to loop by column
number in a simple fashion, without worrying about the actual order of
the variables in the list (they can be v3 v2 v5 for a dataset of v1 v2
v3 v4 v5, and I want to abstract from this problem of indexing).

Robert has posted an absolutely simple and imho superior solution
using -replace-. I like it a lot. The only thing is that (I imagine,
no testing done) it might show some penalties under some
circumstances. In particular, when the number of variables is large
(think thousands) that solution would need to do that many replaces
(or twice that many, but you get the idea). So consider a case when
you have no missings at all. My solution would take only 1 iteration
of each loop i-j-k and you end up with your result pretty quickly.
Robert's solution would go from the begin to the end and would need to
do those thousands of replaces. It doesn't matter with a small number
of variables, and the efficiency of built-in commands more than
compensates that for a small number of variables.

However, I see it is definitely possible to change Robert's solution
it to move backwards as well, in which case that would be an absolute
winner both in performance and readability. It would also make it
compatible with pre-mata Stata's like 5-8 and I would imagine earlier
versions as well. As a matter of fact replace was present in Stata 1.0
:) as can be seen here:
http://www.ats.ucla.edu/stat/sca/Stata1/m-r.pdf
and I hope looping was also possible back than. It would be super
interesting to see Stata 1.0 working live somewhere, perhaps on the
StataCorp's YouTube channel?

Best, Sergiy Radyakin

On Thu, Jun 13, 2013 at 10:13 AM, Rebecca Pope <rebecca.a.pope@gmail.com> wrote:
> Any time I would have saved by using Mata would have been completely
> lost to time figuring out how to accomplish it in Mata.
>
> Sergiy, would you mind providing a little explanation for what your
> code is doing? I made some notes below about what I think is going on,
> but I just want to make sure I'm following you.
>
> ***
>  mata
>
> void prelast() {   <= like -capture program drop-?, you're just
> clearing out any previous definition of the program?
> V=.   <= define a null matrix V
> st_view(V,.,st_local("varlist"))   <= make a view onto the data
> (presumably here to save memory?), all observations on the variables
> given in the Stata local macro varlist (supplied elsewhere)
> R=.
> st_view(R,.,st_local("result"))   <= this bit confused me at first b/c
> I thought the variable had to exist already, but you handle this by
> generating a result variable with all values missing before you run
> prelast(), correct?
>
> for(i=1;i<=rows(V);i++) {  <= loosely, for every observation in the dataset
>
> for(j=0;j<cols(V);j++) {  <= loosely, for all variables given in
> `varlist' except the last one
> if (missing(V[i,cols(V)-j])==0) {   <= j is increasing so the column
> index here is decreasing, in effect, counting backwards
> // found last non-missing
>
> if (cols(V)-j-1<1) break; //nothing before
>
> for(k=cols(V)-j-1;k>=1;k--) {   <= lost me here, why increment k,
> don't you know you want cols(V)-j-1 since cols(V)-j is the last
> non-missing value?
> if (missing(V[i,k])==0)
> R[i,1]=V[i,k]  <= replace the ith observation (row) in the R vector
> with the appropriate value from V
> break;
> }
>
> break;
> }
> }
> }
> }
>
> end
> ***
>
> Thanks,
> Rebecca
>
> < snip >
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index