Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: RE: RE: RE: RE: looping to value of a variable


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: RE: RE: RE: RE: looping to value of a variable
Date   Fri, 24 Feb 2012 09:00:23 +0000

Let me address Richard's belief that

If I don't loop over records [... S]tata will overwrite all flags (all
rows) as 1 as soon as it finds any missing value.

This is groundless. Unless you instruct otherwise Stata only works
with the current observation [row or record in non-Stata terminology].

It is common that people with a lot of experience with other software
find it more difficult to adjust to Stata's ways of thinking than
people with little! This may be happening here.

Nick

On Thu, Feb 23, 2012 at 5:58 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I see, I think.
>
> gen flag = 0
>
> forval j = 1/8 {
>        replace flag = 1 if missing(DFU`j') & flag == 0 & DFU`j' <= maxDFU
> }
>
> Nick
> n.j.cox@durham.ac.uk
>
> Richard Fox
>
> Sorry for the confusion.
>
> I want just one flag that tells me if each record (row) has a missing value for the DFU variables. This would be simple were it not for the fact that for certain rows I only want to assess a subset of the variables for missing values. As per the example data I only want to assess DFU1-DFU(maxFU) for missingness.
>
> If I could use the value of maxFU as above DFU1-DFU(maxFU) then I could simply use
>
> egen rowmiss(DFU1-DFU(maxFU))
>
> but I don't believe that's possible.
>
> If I use egen = rowmiss(DFU1-DFU9) then for the 1st row I'd get 6 whereas I want just 1. For id 3 I'd expect flag ==0.
>
> If I don't loop over records I believe stata will overwrite all flags (all rows) as 1 as soon as it finds any missing value.
>
> After further thought this could be performed with a simple formula. Nonetheless I'm still interested to see how to loop to a variable value. I see that Mata may be a solution and will explore this in more detail. This is something that's easily performed in SAS but I appreciate that stata thinks in the opposite direction.
>
> Not sure if it helps but I'm cleaning data for an oncology study. So for id (patient) 1 there should be 3 follow-up (fu) form each having a date of completion dfu (date follow up).
>
> id      DFU1            DFU2            DFU3            DFU4            DFU5            DFU6            DFU7            DFU8            maxFU
> 1       30/10/1910                      08/02/1904                                                                                      3
> 2       16/12/1908      24/01/1913                      08/02/1904                                                                      4
> 3       04/09/1907      13/10/1911      21/11/1915      30/12/1919      07/02/1924      17/03/1928      25/04/1932                      7
> 4       18/10/1914                      08/02/1904      18/03/1908      26/04/1912      04/06/1916      13/07/1920      21/08/1924      8
>
> I managed to get my code working, perhaps this may illustrate what I'm trying to do;
>
> /* identify rows with missing dates */
> gen flag=0
> count
> local N=r(N)
> forvalues i = 1/`N' {
>
>                                        /* sp holds the max number of follow-ups visits for the particular patient (row) */
>                                        local sp = maxFU[`i']
>                                        forvalues j=1/`sp'      {
>                                                                replace flag=1 if DFU`j'==. & _n==`i'
>                                                                }
>                                        }
>
> Nick Cox
>
> Sorry, but I am still unclear on what flags you want.
>
> The fact that -maxFU- exists seems to be a red herring. You can create flags by
>
> forval j = 1/8 {
>        gen ismissing`j' = missing(dFU`j')
> }
>
> Or, if you want it the other way round, negate the function call with -!missing()-
>
> But why do you need the flags at all?
>
> Even if I am misunderstanding you, which is quite likely, the small bit of Stata technique may be some help.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Richard Fox
>
> Hi Nick,
>
> Yes you're correct, sorry for the confusion over DFU and FU. I added the egen function to illustrate where the loop count values could come from. In fact the values came from reshaping long data.
>
> I want to flag missing dates, however, for each record I need to assess only to a certain point. These are missing follow-up forms in a medical scenario - if patients are only followed for a certain time then I can't record some forms as missing if the patient has reached that time-point.
>
> Take the example below; for the 1st id I only want to loop to 3 to test for missing values. In the second id I only want to loop to 4, and so on. I suppose I could just only increment a counter if `i' <= maxFU. Just to note that the code within the loops (replace flag.....) was incomplete in my previous message - it was really just the form of the loop statements that I was interested in.
>
> id      dfu1            dfu2            dfu3            dfu4            dfu5            dfu6            dfu7            dfu8            maxFU
> 1       30/10/1910                      08/02/1904                                                                                      3
> 2       16/12/1908      24/01/1913                      08/02/1904                                                                      4
> 3       04/09/1907      13/10/1911      21/11/1915      30/12/1919      07/02/1924      17/03/1928      25/04/1932                      7
> 4       18/10/1914                      08/02/1904      18/03/1908      26/04/1912      04/06/1916      13/07/1920      21/08/1924      8
>
> I'll have a look at the reference.
>
> Nick Cox
>
> Your example is not very clear. You have FU* and by implication DFU*. Do you want to flag missings or non-missings? I can read your post either way.
>
> However, you (almost surely) do not need to loop over observations. It is sufficient to loop over variables.
>
> See a review in this territory
>
> SJ-9-1  pr0046  . . . . . . . . . . . . . . . . . . .  Speaking Stata: Rowwise
>        (help rowsort, rowranks if installed) . . . . . . . . . . .  N. J. Cox
>        Q1/09   SJ 9(1):137--157
>        shows how to exploit functions, egen functions, and Mata
>        for working rowwise; rowsort and rowranks are introduced
>
> Nick
> n.j.cox@durham.ac.uk
>
> Richard Fox
> I want to loop to the value of a variable. Let's say I have generated the number of non-missing values in a row of data (maxFU in example below). I want to loop to that value which clearly can differ between records.
>
> The following does the job but feels like cheating.
>
> egen maxFU = rownonmissing(FU1 FU2 FU3 FU4 FU5 )
>
> count
> local N=r(N)
> forvalues i = 1/`N' {
>                                        local sp = maxFU[`i']
>                                        forvalues j=1/`sp'      {
>                                                                qui replace flag`j'=1 if DFU`j'==.
>                                                                }
>                                        }
>
>
>
> There must be a simpler way; any ideas?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index