st: RE: data manipulation

 From "Nick Cox" To Subject st: RE: data manipulation Date Tue, 2 Jul 2002 19:35:05 +0100

```pmitra@mindspring.com

> I have a data set like the following.
>
> zipcode	region quantity revenue flag month year
> "37075","11",1,0,"EXP","03","2002
> "37075","04",53,0,"EXP","03","2002"
> "38242","04",14,0,"EXP","03","2002"
> "38242","11",1,0,"EXP","03","2002"
> "70503","06",25,0,"EXP","03","2002"
> "70503","11",1,0,"EXP","03","2002"
>
> I would like to replace the value of region by its value corresponding to
> higher quantity by zipcode. For example, for zipcode "37075", I
> would like to
> replace the region's value of "11" by "04" since region "04" has higher
> quantity. The new data set will look like the following:
>
> "37075","04",1,0,"EXP","03","2002
> "37075","04",53,0,"EXP","03","2002"
> "38242","04",14,0,"EXP","03","2002"
> "38242","04",1,0,"EXP","03","2002"

If you -sort zipcode quantity- then the last value within each
-zipcode- will have the highest quantity, so you can assign
the corresponding -region- to all observations for that
-zipcode-. This is all achievable within one line:

bysort zipcode (quantity) : replace region = region[_N]

However, if any -quantity- is missing, this will mess
up things.

So we need to protect against that:

gen present = !missing(quantity)
bysort zipcode (present quantity): replace region = region[_N]

Taking that more slowly,

missing(quantity) is 1 if quantity is missing and 0 otherwise

!missing(quantity) flips that round: it is 1 is quantity
is non-missing and 0 otherwise.

The -sort- is hierarchical.

* First, we -sort- on -zipcode-.

* Within -zipcode- we sort on -present-. Thus all the missing
values of -quantity- go first.

* Within -present- we sort on -quantity-. Thus the last
value of -quantity- will be the highest non-missing value
of -quantity- for that -zipcode-.

We look across the observation, pick up the value of -region-
and smear it across all the observations for that -zipcode-.

Nick
n.j.cox@durham.ac.uk
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```