[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Phil Schumm <pschumm@uchicago.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Help on data manipulation |

Date |
Mon, 27 Sep 2004 19:39:08 -0500 |

At 5:29 PM -0400 9/27/04, MITRA PINAKI (MAR1PXM) wrote:

I have a dataset on multiple customer's revenue information like the following: Id customer revenue Z11 xx1 y1 Z21 xx1 y2 Z31 xx1 y3 Z12 xx2 k1 Z22 xx2 k2 Z32 xx2 k3 and so on. I need to create a new data based on the following criteria. If the sum of revenue by customer is <=999, I need to write Customer revenue xx1 sum of y xx2 sum of k But if the sum of revenue by customer is >999 and <=1998, I need to write Id Customer revenue 001 xx1 999 002 xx1 sum of y-999 001 xx2 999 002 xx2 sum of k-999 and if the sum of revenue by customer is >1998 and <=2997, I need to write Id Customer revenue 001 xx1 999 002 xx1 999 003 xx1 sum of y-1998 001 xx2 999 002 xx2 999 003 xx2 sum of k-1998 Can anyone please provide me any help on this?

Here's one alternative:

collapse (sum) revenue, by(customer)

expand ceil(revenue/999)

bys customer: replace revenue = ((_n==_N)*mod(revenue,999)) + ((_n!=_N)*999) if 999<revenue

Briefly, -collapse- computes the total revenue for each customer, resulting in a dataset with one record per customer. -expand- then creates the additional records (e.g., if a customer's total is 2000, this will create two additional records for a total of 3). Finally, the last line replaces revenue either with the value 999 (all records but the last one for each customer) or with the remainder once the multiples of 999 are accounted for (last record for each customer). If you wish, you could then generate your ids for each record (consecutive integers within customer in leading-zero format) using the following:

gsort + customer - revenue

by customer: gen id = string(_n,"%03.0f")

Note that with this solution, records with missing values for revenue in the original dataset will essentially be ignored. However, if a customer has nothing but missing values in the original dataset, the resulting dataset will contain a single record with a revenue of zero (this is because -collapse- treats missing values as zeros). If this is not what you want, you'll need to modify the code accordingly.

-- Phil

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

**References**:**st: Help on data manipulation***From:*"MITRA PINAKI (MAR1PXM)" <mar1pxm@ups.com>

- Prev by Date:
**st: AW: Help on data manipulation** - Next by Date:
**st: re: help on data manipulation** - Previous by thread:
**st: AW: Help on data manipulation** - Next by thread:
**RE: st: Help on data manipulation** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |