Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Structure for making line by line changes?


From   David Kantor <dkantor@jhu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Structure for making line by line changes?
Date   Fri, 25 Apr 2003 17:26:58 -0400

At 12:06 PM 4/25/2003 -0700, Dan Sabath wrote:
Hello,

I am very new to Stata and am having some difficulty wrapping my brain around Stata's methods of data processing.
[...]
I wrote a "program" to calculate the value based on args passed into it.
"checkfoo" returns a 1 or 0 depending if one of the arglist matches the first arg.
so r(checked) = 1 if `k' or `l' is equal to `j'
otherwise r(checked) = 0.

/*******vastly simplified**********/
local j = 1
gen k = 2
gen l = 3

while `j' < 6 {
checkfoo `j' `k' `l' /* r(checked) returned equal to 1 when j = 2 or 3*/
replace k = 4 if r(checked) == 1
local j = `j' + 1
}
/***********************************/
I would like it to replace "k" on the 2nd and 3rd time through the loop but not at any other time.

I would be happy if I could just do
/* psudocode */
replace k = 4 if checkfoo `j' `k' `l' /* with checkfoo evaluating true or false */
[...]
Nick Cox has replied to this, but I would like to add some comments.

Your code generates variables k and l. Then you pass macros `k' and `l' to checkfoo. Variables and macros are different kinds of entities. If you haven't defied these macros (and they are not defined in your code sample) then they are empty, and you are only passing one argument (`j') to checkfoo.

Note, also that
gen k = 2
gen l = 3

set the variables k and l to 2 and 3 -- for all observations in the entire dataset.

When your loop comes to...
replace k = 4 if r(checked) == 1

then k will be replaced with 4 -- again, for all observations in the entire dataset, since r(checked) is a scalar quantity.

(Actually, this is one place where it would be equivalent to write...
if r(checked) {
replace k = 4
}
but in general, there is a big difference between the -if- statement and the -if- qualifier. There is a FAQ on this subject.)
Since this replace k = 4 will affect every observation, there seems little point to doing it. Presumably there will be other code that you have omitted. But, since this -replace- affects all observations equally, it might better have been a scalar or a macro,
But if, as I might suspect, you are thinking of looping through the observations, then your code is not correct. But then, most likely, there is no point to correcting it as such; what you want to do is probably easily done in a few statements, once you get the idea of how Stata works. In fact, your "pseudocode" sample is almost (but not quite) a correct Stata statement -- if you are thinking of replacing k in some observations and not others.

Your pseudocode sample will not work, because in...
replace k = 4 if checkfoo `j' `k' `l' /* with checkfoo evaluating true or false */
you cannot create your own function (checkfoo) that can be referenced in an expression.

You can, on the other hand, create a variable to carry the info that you want. You can also write a program to generate that variable. It is not clear whether you intended checkfoo to be such a program. (As shown in your example, it would appear that it yields scalar information, but you may have had something else in mind.)

Overall I would suggest these points:

1: Understand the difference between variables, scalars and macros. (Scalars and macros are similar in that they have a single value. Variables have a set of values: one for each observation. Note, also that if a program returns something in r(), that returned value is a scalar or macro.)

2: Most Stata statements that operate on the data do so on the whole dataset at once. (Actually, there is a sequential aspect to the action that processes the statement, but you usually don't need to think about it.) It may help to remember that, for example, in you code...
gen k = 2
gen l = 3

first, k is created and set to 2 for all observations; then l is created and set to 3 for all observations.

3: Understand the difference between the -if- statement and the -if- qualifier.

4: Looping is useful for actions that occur at a level that is logically higher than the individual observations. You almost never need to loop through the observations. If you are attempting to write code to loop through the observations, you probably are not thinking about the problem correctly. (Sometimes it is necessary, and I have done it -- *very* rarely.)

I hop this helps.
-- David

David Kantor
Institute for Policy Studies
Johns Hopkins University
dkantor@jhu.edu
410-516-5404

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index