Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Structure for making line by line changes?

From   David Kantor <[email protected]>
To   [email protected]
Subject   Re: st: Structure for making line by line changes?
Date   Mon, 28 Apr 2003 10:33:44 -0400

At 05:49 PM 4/25/2003 -0700, Daniel Sabath wrote:
Hi David,

Thank you. A lot of what you mentioned makes sense. Please see my previous reply to Fred and Nick as I think I may have explained myself a little clearer there.

Now on to the nitty gritty...

I am presently unable to analyze everything you wrote, but I will give you a few pointers.

It was quite a surprise to find out that the if statement only evaluates its conditions once and not on each row. As a result, i'm not sure when it would be useful.
It is very useful, usually about things that are above the level of individual observations. Here's an example:
capture assert ~mi(myvar)
if _rc ~=0 {
disp as error "myvar has missings"
exit 459

At what point are scalars and macros evaluated? Can you reset the value in the middle of the run depending on other calculations? IE
x = 0;
replace y = z if x < 10, x++
They are evaluated whenever you reference them. They are set when you set them. But you can only set them *between* any -generate- or -replace- operations. (-generate- and -replace- operate on variables.)

The code you wrote above is not Stata code. The "x = 0" will not work. You need to prefix it with...
"generate" or "replace" if x is a variable
"scalar" if x is a scalar
"local" if x is a local macro

(Other possibilities exist, but these are the basics.)

The second statement is fine (assuming y is a variable), up until the ", x++". The latter is not allowed. (There is a ++ operator (in Stata 8) but this is not where you use it.)

> 2: Most Stata statements that operate on the data do so on the whole
> dataset at once. (Actually, there is a sequential aspect to the action that
> processes the statement, but you usually don't need to think about it.) It
> may help to remember that, for example, in you code...
> gen k = 2
> gen l = 3
> first, k is created and set to 2 for all observations; then l is created
> and set to 3 for all observations.

I believe that this is one of the fundimental differences (and a hard one to get your head around) between stata and other stats languages. The implicit loop through the data exists on each *line* of the do file and not around the program as a whole. Other languages work on the data a line at a time and allow you to make as many calculations / modifications as you like before proceeding. Please correct me if I am missing something.
You are correct here. And this is truly a fundamental difference between Stata and the others. Once you get this, you are on your way to using Stata effectively. It is a more wholistic approach to handling the data. (Also, it may help to remember that what you said applies to commands entered interactively. A do file is just a way of preparing your commands.)

But there are a few situations where it either doesn't work as smoothly as traditional programming methods, or requires a very different way of thinking. Your task of picking the three maximal values from among several variables is one such situation. It is actually easy to pick the one maximal value:
egen ... rmax()

Picking two or more maximal values is trickier; Scott Merryman gave you one possibility. Another might be to write some "traditional-looking" code within a loop that references individual observation. That is the route of last resort. Yet another was suggested by Nick Cox -- to reshape long and then sort. After the sort, retain the three cases at the end of each group. (Then, if you want, reshape wide.) This latter method is a good example of the "different way of thinking" that is characteristic of Stata.

Incidentally, I believe that none of your code examples contain a reference to an individual observation, though you might have been thinking that you have. But don't try. To reference an individual observation is useful in relatively rare situations, but is avoided in general.

Good luck.

David Kantor
Institute for Policy Studies
Johns Hopkins University
[email protected]

* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index