[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: data manipulation

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: data manipulation
Date	Tue, 2 Jul 2002 19:35:05 +0100

[email protected]

> I have a data set like the following.
> 
> zipcode	region quantity revenue flag month year
> "37075","11",1,0,"EXP","03","2002
> "37075","04",53,0,"EXP","03","2002"
> "38242","04",14,0,"EXP","03","2002"
> "38242","11",1,0,"EXP","03","2002"
> "70503","06",25,0,"EXP","03","2002"
> "70503","11",1,0,"EXP","03","2002"
> 
> I would like to replace the value of region by its value corresponding to
> higher quantity by zipcode. For example, for zipcode "37075", I 
> would like to
> replace the region's value of "11" by "04" since region "04" has higher
> quantity. The new data set will look like the following:
>  
> "37075","04",1,0,"EXP","03","2002
> "37075","04",53,0,"EXP","03","2002"
> "38242","04",14,0,"EXP","03","2002"
> "38242","04",1,0,"EXP","03","2002"

If you -sort zipcode quantity- then the last value within each 
-zipcode- will have the highest quantity, so you can assign 
the corresponding -region- to all observations for that
-zipcode-. This is all achievable within one line: 

bysort zipcode (quantity) : replace region = region[_N] 

However, if any -quantity- is missing, this will mess 
up things.  

So we need to protect against that: 

gen present = !missing(quantity) 
bysort zipcode (present quantity): replace region = region[_N] 

Taking that more slowly, 

missing(quantity) is 1 if quantity is missing and 0 otherwise 

!missing(quantity) flips that round: it is 1 is quantity 
is non-missing and 0 otherwise. 

The -sort- is hierarchical. 

* First, we -sort- on -zipcode-. 

* Within -zipcode- we sort on -present-. Thus all the missing 
  values of -quantity- go first. 

* Within -present- we sort on -quantity-. Thus the last 
  value of -quantity- will be the highest non-missing value
 of -quantity- for that -zipcode-. 

We look across the observation, pick up the value of -region-
and smear it across all the observations for that -zipcode-. 

Nick 
[email protected] 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: data manipulation
  - From: <[email protected]>

Prev by Date: st: data manipulation
Next by Date: st: RE: Multiple graphs
Previous by thread: st: data manipulation
Next by thread: st: RE: Multiple graphs
Index(es):
- Date
- Thread