[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Help on data manipulation |

Date |
Tue, 28 Sep 2004 11:03:16 +0100 |

It is not what was asked for but you could hold the same information in another way: collapse (sum) revenue, by(customer) gen mult999 = floor(revenue/999) gen surplus = mod(revenue,999) There doesn't seem any special virtue in holding lots of 999s with arbitrary identifiers. Nick n.j.cox@durham.ac.uk Phil Schumm > > At 5:29 PM -0400 9/27/04, MITRA PINAKI (MAR1PXM) wrote: > > I have a dataset on multiple customer's revenue > information like the > >following: > > > >Id customer revenue > >Z11 xx1 y1 > >Z21 xx1 y2 > >Z31 xx1 y3 > >Z12 xx2 k1 > >Z22 xx2 k2 > >Z32 xx2 k3 and so on. > > > >I need to create a new data based on the following criteria. > If the sum of > >revenue by customer is <=999, I need to write > >Customer revenue > >xx1 sum of y > >xx2 sum of k > > > >But if the sum of revenue by customer is >999 and <=1998, I > need to write > >Id Customer revenue > >001 xx1 999 > >002 xx1 sum of y-999 > >001 xx2 999 > >002 xx2 sum of k-999 > > > >and if the sum of revenue by customer is >1998 and <=2997, I > need to write > >Id Customer revenue > >001 xx1 999 > >002 xx1 999 > >003 xx1 sum of y-1998 > >001 xx2 999 > >002 xx2 999 > >003 xx2 sum of k-1998 > > > >Can anyone please provide me any help on this? > > > Here's one alternative: > > > collapse (sum) revenue, by(customer) > expand ceil(revenue/999) > bys customer: replace revenue = ((_n==_N)*mod(revenue,999)) + > ((_n!=_N)*999) if 999<revenue > > > Briefly, -collapse- computes the total revenue for each customer, > resulting in a dataset with one record per customer. -expand- then > creates the additional records (e.g., if a customer's total is 2000, > this will create two additional records for a total of 3). Finally, > the last line replaces revenue either with the value 999 (all records > but the last one for each customer) or with the remainder once the > multiples of 999 are accounted for (last record for each customer). > If you wish, you could then generate your ids for each record > (consecutive integers within customer in leading-zero format) using > the following: > > > gsort + customer - revenue > by customer: gen id = string(_n,"%03.0f") > > > Note that with this solution, records with missing values for revenue > in the original dataset will essentially be ignored. However, if a > customer has nothing but missing values in the original dataset, the > resulting dataset will contain a single record with a revenue of zero > (this is because -collapse- treats missing values as zeros). If this > is not what you want, you'll need to modify the code accordingly. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: -moments- available from SSC** - Next by Date:
**st: mcl, rc2 at SSC** - Previous by thread:
**Re: st: Help on data manipulation** - Next by thread:
**[no subject]** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |