Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: Help on data manipulation


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Help on data manipulation
Date   Tue, 28 Sep 2004 11:03:16 +0100

It is not what was asked for but you 
could hold the same information in another 
way: 

collapse (sum) revenue, by(customer)
gen mult999 = floor(revenue/999) 
gen surplus = mod(revenue,999) 

There doesn't seem any special virtue 
in holding lots of 999s with arbitrary 
identifiers. 

Nick 
n.j.cox@durham.ac.uk 

Phil Schumm
> 
> At 5:29 PM -0400 9/27/04, MITRA PINAKI  (MAR1PXM) wrote:
> >     I have a dataset on multiple customer's revenue 
> information like the
> >following:
> >
> >Id    customer    revenue
> >Z11    xx1         y1
> >Z21    xx1         y2
> >Z31    xx1         y3
> >Z12    xx2         k1
> >Z22    xx2         k2
> >Z32    xx2         k3  and so on.
> >
> >I need to create a new data based on the following criteria. 
> If the sum of
> >revenue by customer is <=999, I need to write
> >Customer    revenue
> >xx1         sum of y
> >xx2         sum of k
> >
> >But if the sum of revenue by customer is >999 and <=1998, I 
> need to write
> >Id     Customer    revenue
> >001    xx1         999
> >002    xx1         sum of y-999
> >001    xx2         999
> >002    xx2         sum of k-999
> >
> >and if the sum of revenue by customer is >1998 and <=2997, I 
> need to write
> >Id     Customer    revenue
> >001    xx1         999
> >002    xx1         999
> >003    xx1         sum of y-1998
> >001    xx2         999
> >002    xx2         999
> >003    xx2         sum of k-1998
> >
> >Can anyone please provide me any help on this?
> 
> 
> Here's one alternative:
> 
> 
> collapse (sum) revenue, by(customer)
> expand ceil(revenue/999)
> bys customer: replace revenue = ((_n==_N)*mod(revenue,999)) + 
> ((_n!=_N)*999) if 999<revenue
> 
> 
> Briefly, -collapse- computes the total revenue for each customer, 
> resulting in a dataset with one record per customer.  -expand- then 
> creates the additional records (e.g., if a customer's total is 2000, 
> this will create two additional records for a total of 3).  Finally, 
> the last line replaces revenue either with the value 999 (all records 
> but the last one for each customer) or with the remainder once the 
> multiples of 999 are accounted for (last record for each customer). 
> If you wish, you could then generate your ids for each record 
> (consecutive integers within customer in leading-zero format) using 
> the following:
> 
> 
> gsort + customer - revenue
> by customer: gen id = string(_n,"%03.0f")
> 
> 
> Note that with this solution, records with missing values for revenue 
> in the original dataset will essentially be ignored.  However, if a 
> customer has nothing but missing values in the original dataset, the 
> resulting dataset will contain a single record with a revenue of zero 
> (this is because -collapse- treats missing values as zeros).  If this 
> is not what you want, you'll need to modify the code accordingly.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index