Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: data manipulation

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: data manipulation
Date   Tue, 2 Jul 2002 19:35:05 +0100

[email protected]

> I have a data set like the following.
> zipcode	region quantity revenue flag month year
> "37075","11",1,0,"EXP","03","2002
> "37075","04",53,0,"EXP","03","2002"
> "38242","04",14,0,"EXP","03","2002"
> "38242","11",1,0,"EXP","03","2002"
> "70503","06",25,0,"EXP","03","2002"
> "70503","11",1,0,"EXP","03","2002"
> I would like to replace the value of region by its value corresponding to
> higher quantity by zipcode. For example, for zipcode "37075", I 
> would like to
> replace the region's value of "11" by "04" since region "04" has higher
> quantity. The new data set will look like the following:
> "37075","04",1,0,"EXP","03","2002
> "37075","04",53,0,"EXP","03","2002"
> "38242","04",14,0,"EXP","03","2002"
> "38242","04",1,0,"EXP","03","2002"

If you -sort zipcode quantity- then the last value within each 
-zipcode- will have the highest quantity, so you can assign 
the corresponding -region- to all observations for that
-zipcode-. This is all achievable within one line: 

bysort zipcode (quantity) : replace region = region[_N] 

However, if any -quantity- is missing, this will mess 
up things.  

So we need to protect against that: 

gen present = !missing(quantity) 
bysort zipcode (present quantity): replace region = region[_N] 

Taking that more slowly, 

missing(quantity) is 1 if quantity is missing and 0 otherwise 

!missing(quantity) flips that round: it is 1 is quantity 
is non-missing and 0 otherwise. 

The -sort- is hierarchical. 

* First, we -sort- on -zipcode-. 

* Within -zipcode- we sort on -present-. Thus all the missing 
  values of -quantity- go first. 

* Within -present- we sort on -quantity-. Thus the last 
  value of -quantity- will be the highest non-missing value
 of -quantity- for that -zipcode-. 

We look across the observation, pick up the value of -region-
and smear it across all the observations for that -zipcode-. 

[email protected] 
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index