Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: operation on group


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: operation on group
Date   Sun, 11 May 2003 15:50:58 +0100

Radu Ban 

> i have a dataset in which a variable has to be the same within
groups
> created by other variables. the variable is a 0-1 binary. if, within
a
> group there's at least one 0 zero value i want to set all the values
to 0.
> the groups are generated by the "serial" and "month" variables.
> 
> i used the following command:
> 
> bysort serial month: replace var = 0 if (var ~= var[_n-1] & _n>1) |
> (var ~= var[_n+1] & _n==1)
> 
> but this appears to be causing some randomness because the end
results of
> the code are different with each re-run. and i don't see other
commands
> that might cause randomness.

An equivalent is to generate from your binary variable 
its minimum within groups, which is canned as an -egen- function 

egen min = min(var), by(serial month) 
replace var = min 

A way to do it from first principles is in fact shorter 
and avoids the device of another variable 

bysort serial month (var) : replace var = var[1] 

On the other hand, there is some information loss in your overwriting
the original variable. 

Any way, there is a lot going on here: let's break it into steps

1. Sort on -serial-, within that order by -month-, and within that 
by -var-. 

2. Within the categories defined by -serial- and -month- 
-replace var- by its first value, -var[1]-. Note the principle 
that the subscript, here [1], is interpreted within categories, that
is, within the groups defined by -serial- and -month-. After 
sorting the minimum value of -var- is held within the first
observation 1. 

3. If there are ties for minimum, you still get the right answer. 

4. Missing values won't mess this up as they are sorted to 
the high end of each group. However, this implies that getting
the maximum in this way would require more care. 

The difference between this approach and yours is that 
you don't sort on -var- within the categories defined defined
by -serial- and -month-. 

Stata in this context, as in others, as quite literal. Given 
your instruction 

bysort month serial: ... 

it is satisfied with any solution satisfying that instruction. 
It pays no attention whatsovever to the order of -var- within 
the categories. In addition, as you observe, there is even 
some unpredictability about their order. So it is essential 
that you arrange the exact -sort- order you want. 

There was a tutorial on -by:- in Stata Journal 2(1), 2002. 

Nick
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index