Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Help with a loop


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Help with a loop
Date   Mon, 16 Jul 2007 19:07:24 +0100

Guillermo Villa asked (edited to remove or correct =09 and =3D,
etc., presumably a side-effect of not sending plain text
as requested):

-----------------------------------------------------------------
I am working on a dataset that looks like this:

Var11 Var12 Var13 Var14 Var15
1 2 3 . 5
2 4 . . .
3 . 4 . 6
. . 3 . .

I need to move all the values to the left, skipping the missing
values:

Var11 Var12 Var13 Var14 Var15
1 2 3 5 .
2 4 . . .
3 4 6 . .
3 . . . .


I have tried the following:

foreach i of numlist 1/4 {
if Var1`i' == . {
replace Var1`i' = Var1`i'+1
replace Var1`i'+1 == .
}
}

But I get this warning: + invalid name r(198), which refers to the
last -replace- command.
---------------------------------------------------------------------

Ada Ma suggested
----------------------------------------------------------------------
I don't think you need a loop. How about this:

gen string = string( Var1)+ string( Var2) + string( Var3) + string( Var4)
replace string = subinstr(string, ".","",.)
drop Var*
gen Var1 = substr(string,1,1)
gen Var2 = substr(string,2,1)
...

so on and so forth

This would work if all your numbers are single digits / all have
identical digits. If they aren't consider generating string versions
of the variables and make them all the same lengths (put zeros in
front of small numbers), and then put them all together in one
variable, stripe out the spaces and missing's, and cut them apart
again. Tedious but would work.
-----------------------------------------------------------------------

Let's look at Ada's constructive suggestion first. We can weaken the assumption that the data are single integers, and we can make her approach less tedious than she fears. (I don't understand what Ada
means by "identical digits".)

The basic, and in my view good, idea is a three-step: Concatenate the variables, zap the characters indicating missings, and split up the composite.

egen all = concat(Var11-Var15), p(" ")

will do the string conversion, and the concatenation, in one swoop.
I punctuate with spaces to make the later splitting up much easier.

Zapping the missings is done as Ada did it:

replace all = substr(all, ".", "", .)

Splitting up is done like this:

split all, destring

I would not -drop- the original variables. Unnecessary, and
indeed dangerous.

This is not going to work if the original data have decimal places,
as Ada hinted, for then the decimal points would get zapped along
with the periods indicating missings.

What other approaches are possible? Here are three.

1. -reshape-, -sort- and -reshape- back. Will work fine with
non-integers.

2. -rowsort- or -sortrows- from SSC. -sortrows- will work
fine with non-integers, but not -rowsort-.

3. A loop like this:

gen all = ""
foreach v of var Var11-Var15 {
replace all = all + string(`v') + " " if `v' < .
}
split all, destring

Needs some tweaking for non-integers.

Lastly, what's wrong with Guillermo's code? I can see several
problems apart from that triggering his error message.

0. Not a problem, but -forval- is better style than
-foreach- if we are just cycling over successive integers.

1. -if Var`i' == .- is not going to do what he wants.
This is a big bad bug and a FAQ. See
http://www.stata.com/support/faqs/lang/ifqualifier.html

2. A typo. I think I see == on the second -replace-; that
should be =.

Fixing 0 and 1 and 2 gives us

forval i = 1/4 {
replace Var1`i' = Var1`i'+1 if Var1`i' == .
replace Var1`i'+1 = . if Var1`i' == .
}

But this breaks the algorithm, as the second test is too
late once the first variable has already been changed.
That is probably why Guillermo wrote the code the way
he did.

3. And we still have a big problem with macro substitution.
Guillermo wants `i' + 1 to be evaluated and replaced with its
result, but his code won't do that.

What he wanted is more like this:

gen temp = .
forval i = 1/4 {
replace temp = Var1`i'
local j = `i' + 1
replace Var1`i' = Var1`j' if Var1`i' == .
replace Var1`j' = . if temp == .
}

That is probably now legal, but it's awkward, and
worst of all, it doesn't fully solve the problem.
Just swapping pairs is not enough to sort, as the
example . 3 . . 4 should make clear. Guillermo's swapping
will leave that as 3 . . 4 . , or so I get. I take
this to be a fatal objection, so all my other fixes
are futile.

G. could use `= `i' + 1' to get the macro manipulation he
wants, but as his overall approach is doomed that is a detail.

Fortunately there are other ways to do it, as already shown.

Nick
n.j.cox@durham.ac.uk





*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index