[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
n j cox <n.j.cox@durham.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Help with a loop |

Date |
Mon, 16 Jul 2007 19:07:24 +0100 |

Guillermo Villa asked (edited to remove or correct =09 and =3D,

etc., presumably a side-effect of not sending plain text

as requested):

-----------------------------------------------------------------

I am working on a dataset that looks like this:

Var11 Var12 Var13 Var14 Var15

1 2 3 . 5

2 4 . . .

3 . 4 . 6

. . 3 . .

I need to move all the values to the left, skipping the missing

values:

Var11 Var12 Var13 Var14 Var15

1 2 3 5 .

2 4 . . .

3 4 6 . .

3 . . . .

I have tried the following:

foreach i of numlist 1/4 {

if Var1`i' == . {

replace Var1`i' = Var1`i'+1

replace Var1`i'+1 == .

}

}

But I get this warning: + invalid name r(198), which refers to the

last -replace- command.

---------------------------------------------------------------------

Ada Ma suggested

----------------------------------------------------------------------

I don't think you need a loop. How about this:

gen string = string( Var1)+ string( Var2) + string( Var3) + string( Var4)

replace string = subinstr(string, ".","",.)

drop Var*

gen Var1 = substr(string,1,1)

gen Var2 = substr(string,2,1)

...

so on and so forth

This would work if all your numbers are single digits / all have

identical digits. If they aren't consider generating string versions

of the variables and make them all the same lengths (put zeros in

front of small numbers), and then put them all together in one

variable, stripe out the spaces and missing's, and cut them apart

again. Tedious but would work.

-----------------------------------------------------------------------

Let's look at Ada's constructive suggestion first. We can weaken the assumption that the data are single integers, and we can make her approach less tedious than she fears. (I don't understand what Ada

means by "identical digits".)

The basic, and in my view good, idea is a three-step: Concatenate the variables, zap the characters indicating missings, and split up the composite.

egen all = concat(Var11-Var15), p(" ")

will do the string conversion, and the concatenation, in one swoop.

I punctuate with spaces to make the later splitting up much easier.

Zapping the missings is done as Ada did it:

replace all = substr(all, ".", "", .)

Splitting up is done like this:

split all, destring

I would not -drop- the original variables. Unnecessary, and

indeed dangerous.

This is not going to work if the original data have decimal places,

as Ada hinted, for then the decimal points would get zapped along

with the periods indicating missings.

What other approaches are possible? Here are three.

1. -reshape-, -sort- and -reshape- back. Will work fine with

non-integers.

2. -rowsort- or -sortrows- from SSC. -sortrows- will work

fine with non-integers, but not -rowsort-.

3. A loop like this:

gen all = ""

foreach v of var Var11-Var15 {

replace all = all + string(`v') + " " if `v' < .

}

split all, destring

Needs some tweaking for non-integers.

Lastly, what's wrong with Guillermo's code? I can see several

problems apart from that triggering his error message.

0. Not a problem, but -forval- is better style than

-foreach- if we are just cycling over successive integers.

1. -if Var`i' == .- is not going to do what he wants.

This is a big bad bug and a FAQ. See

http://www.stata.com/support/faqs/lang/ifqualifier.html

2. A typo. I think I see == on the second -replace-; that

should be =.

Fixing 0 and 1 and 2 gives us

forval i = 1/4 {

replace Var1`i' = Var1`i'+1 if Var1`i' == .

replace Var1`i'+1 = . if Var1`i' == .

}

But this breaks the algorithm, as the second test is too

late once the first variable has already been changed.

That is probably why Guillermo wrote the code the way

he did.

3. And we still have a big problem with macro substitution.

Guillermo wants `i' + 1 to be evaluated and replaced with its

result, but his code won't do that.

What he wanted is more like this:

gen temp = .

forval i = 1/4 {

replace temp = Var1`i'

local j = `i' + 1

replace Var1`i' = Var1`j' if Var1`i' == .

replace Var1`j' = . if temp == .

}

That is probably now legal, but it's awkward, and

worst of all, it doesn't fully solve the problem.

Just swapping pairs is not enough to sort, as the

example . 3 . . 4 should make clear. Guillermo's swapping

will leave that as 3 . . 4 . , or so I get. I take

this to be a fatal objection, so all my other fixes

are futile.

G. could use `= `i' + 1' to get the macro manipulation he

wants, but as his overall approach is doomed that is a detail.

Fortunately there are other ways to do it, as already shown.

Nick

n.j.cox@durham.ac.uk

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Help with a loop***From:*n j cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: RE: how can I create deciles by group?** - Next by Date:
**Re: st: anova mtest with alternate error term** - Previous by thread:
**Re: st: Help with a loop** - Next by thread:
**Re: st: Help with a loop** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |