Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Structure for making line by line changes?


From   Daniel Sabath <sabathd@u.washington.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Structure for making line by line changes?
Date   Fri, 25 Apr 2003 17:49:12 -0700 (PDT)

Hi David, 

Thank you. A lot of what you mentioned makes sense. Please see my previous reply to Fred and Nick as I think I may have explained myself a little clearer there.

Now on to the nitty gritty...

On Fri, 25 Apr 2003, David Kantor wrote:

> At 12:06 PM 4/25/2003 -0700, Dan Sabath wrote:
> >Hello,
> >
> >I am very new to Stata and am having some difficulty wrapping my brain 
> >around Stata's methods of data processing.
> [...]
> I wrote a "program" to calculate the value based on args passed into it.
> >"checkfoo" returns a 1 or 0 depending if one of the arglist matches the 
> >first arg.
> >so r(checked) = 1 if `k' or `l' is equal to `j'
> >otherwise r(checked) = 0.
> >
> >/*******vastly simplified**********/
> >local j = 1
> >gen k = 2
> >gen l = 3
> >
> >while `j' < 6 {
> >   checkfoo `j' `k' `l' /* r(checked) returned equal to 1 when j = 2 or 3*/
> >   replace k = 4 if r(checked) == 1
> >   local j = `j' + 1
> >}
> >/***********************************/
> >I would like it to replace "k" on the 2nd and 3rd time through the loop 
> >but not at any other time.
> >
> >I would be happy if I could just do
> >/* psudocode */
> >replace k = 4 if checkfoo `j' `k' `l' /* with checkfoo evaluating true or 
> >false */
> >[...]
> 
> Nick Cox has replied to this, but I would like to add some comments.
> 
> Your code generates variables k and l.  Then you pass macros `k' and `l' to 
> checkfoo.  Variables and macros are different kinds of entities.  If you 
> haven't defied these macros (and they are not defined in your code sample) 
> then they are empty, and you are only passing one argument (`j') to checkfoo.

Ideally I would be passing the value of k and j on that row into checkfoo. perhaps something like k[_n] would work? I'm beginning to see that the implicit loop through the dataset exists in a different location then where I thought it did. My previous email explains it better.

> 
> Note, also that
>   gen k = 2
>   gen l = 3
> 
> set the variables k and l to 2 and 3 -- for all observations in the entire 
> dataset.
> 

Yes that was intentional. The actual data is a little more complicated and varies on a row by row basis...but for the example I used this.

> When your loop comes to...
>    replace k = 4 if r(checked) == 1
> 
> then k will be replaced with 4 -- again, for all observations in the entire 
> dataset, since r(checked) is a scalar quantity.

This behavior I was not expecting. I was expecting r(checked) to change with the values from each row.

> 
> (Actually, this is one place where it would be equivalent to write...
>   if r(checked) {
>    replace k = 4
>   }
> but in general, there is a big difference between the -if- statement and 
> the -if- qualifier. There is a FAQ on this subject.)

It was quite a surprise to find out that the if statement only evaluates its conditions once and not on each row. As a result, i'm not sure when it would be useful.

> Since this replace k = 4 will affect every observation, there seems little 
> point to doing it.  Presumably there will be other code that you have 
> omitted.  But, since this -replace- affects all observations equally, it 
> might better have been a scalar or a macro,
> But if, as I might suspect, you are thinking of looping through the 
> observations, then your code is not correct.  But then, most likely, there 
> is no point to correcting it as such; what you want to do is probably 
> easily done in a few statements, once you get the idea of how Stata 
> works.  In fact, your "pseudocode" sample is almost (but not quite) a 
> correct Stata statement -- if you are thinking of replacing k in some 
> observations and not others.
> 

That is exactly what I was aiming for. 

> Your pseudocode sample will not work, because in...
>   replace k = 4 if checkfoo `j' `k' `l' /* with checkfoo evaluating true or 
> false */
> you cannot create your own function (checkfoo) that can be referenced in an 
> expression.

to my great dissapointment :( 

> 
> You can, on the other hand, create a variable to carry the info that you 
> want.  You can also write a program to generate that variable.  It is not 
> clear whether you intended checkfoo to be such a program. (As shown in your 
> example, it would appear that it yields scalar information, but you may 
> have had something else in mind.)

checkfoo is actually an .ado file 
/*****************
Checkfoo checks arglist[i] against arglist[0]; returns 1 if match and 0 if not match.
usage: checkfoo primary_var check1_var check2_var ...
returns r(checked) = 0 || 1
******************/

local checked = 0
local i = `1'
while "`2'" ~= "" {
	if `i'==`2' {
		local checked = 1
	}
	macro shift
}

return scalar checked = `checked'
end

> 
> Overall I would suggest these points:
> 
> 1: Understand the difference between variables, scalars and 
> macros.  (Scalars and macros are similar in that they have a single value. 
> Variables have a set of values: one for each observation. Note, also that 
> if a program returns something in r(), that returned value is a scalar or 
> macro.)
> 

At what point are scalars and macros evaluated? Can you reset the value in the middle of the run depending on other calculations? IE
x = 0;
replace y = z if x < 10, x++

> 2: Most Stata statements that operate on the data do so on the whole 
> dataset at once. (Actually, there is a sequential aspect to the action that 
> processes the statement, but you usually don't need to think about it.)  It 
> may help to remember that, for example, in you code...
>   gen k = 2
>   gen l = 3
> 
> first, k is created and set to 2 for all observations; then l is created 
> and set to 3 for all observations.

I believe that this is one of the fundimental differences (and a hard one to get your head around) between stata and other stats languages. The implicit loop through the data exists on each *line* of the do file and not around the program as a whole. Other languages work on the data a line at a time and allow you to make as many calculations / modifications as you like before proceeding. Please correct me if I am missing something.
(see http://www.cpc.unc.edu/services/computer/presentations/sas_to_stata/basic.html for more examples of the differences)

> 
> 3: Understand the difference between the -if- statement and the -if- qualifier.
> 
> 4: Looping is useful for actions that occur at a level that is logically 
> higher than the individual observations.  You almost never need to loop 
> through the observations.  If you are attempting to write code to loop 
> through the observations, you probably are not thinking about the problem 
> correctly.  (Sometimes it is necessary, and I have done it -- *very* rarely.)

And this is exactly why I'm asking. I need to get my head adjusted to think about problems in this manner. I really have appreciated all the help you guys have been. Thank you!

> 
> I hop this helps.
It certainly has. Thanks again!

-dan


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index