Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: programming loops efficiently


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: programming loops efficiently
Date   Wed, 14 Nov 2012 12:36:09 +0000

This code is very likely to bite. First off, in Elizabeth's problem it
seems highly likely that many values are missing. Presumably, she has
25 variables because that many are needed in some cases, but in many
cases several of those variables, especially the last few, will be
missing.

Thus the comparison

... if Vh`x'  > maxhr

will often just find the (last) missing value, as numeric missing
counts as greater than any non-missing value, and each time the
condition is true, the corresponding variables will be changed by a
-replace-.

My earlier post linked to a discussion (including a reference) that
discusses better code for when missings are present.

But Alex's code can be tweaked to avoid this problem.

gen max = 1 if !missing(Vh01)
gen maxhr = Vh01 if !missing(Vh01)

forvalues x = 2/24 {
              local X : di %02.0f `x'
              replace max = `x' if Vh`X'  > maxhr & !missing(Vh`X')
              replace maxhr = Vh`X'  if Vh`X'  > maxhr & !missing(Vh`X')
}

Note that the -rename- can be avoided by using a format that insists
on leading zeros for 1...9.

Nick

On Wed, Nov 14, 2012 at 10:21 AM, Alex Armand <a.armand@ucl.ac.uk> wrote:

> If you want a variable stating the person with max hours for each observation (row) then this code should produce what you need.
>
> For simplicity I would rename variable Vh01-Vh09 into Vh1-Vh9.
> ________________________________
> * This defines the variable that contains the person with maximum
>
>         gen max = 1
>         gen maxhr = Vh1
>
> * Loop control for others
>
>         forvalues x = 2/24 {
>
>                 replace max = `x' if Vh`x'  > maxhr
>                 replace maxhr = Vh`x'  if Vh`x'  > maxhr
>
>         }
> ________________________________
>
> Alex
>
>
> Il giorno 14/nov/2012, alle ore 10.53, Breeze, Elizabeth ha scritto:
>
>> I am creating some variables and I am sure that my syntax is needlessly long where there is a repeat pattern to the commands.
>>
>> I have variables  Vh01-Vh25 which give the number of hours worked by persons number 01-25 respectively.
>> I want to find which person worked maximum hours and what that maximum was
>> egen maxhr = rowmax(Vh01-Vh25)  gives me the maximum number of hours
>>
>> Is there a quick way to find the person number with the max no. hours by taking advantage of those last two digits of Vh01-Vh25?
>> I can do it with many lines of syntax but am sure there must be a neater way using some form of loop.
>> Each record in the dataset concerns one interviewee and persons 01-25 are people who work for the interviewee
>>
>> A further complication is that more than one person may work that maximum number of hours.
>>
>> Also is there an equivalent to rowmax that gives the second to largest value in the series?
>>
>> Grateful for any tips.    I am not a programmer
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index