RE: st: Help on data manipulation

 From "Nick Cox" To Subject RE: st: Help on data manipulation Date Tue, 28 Sep 2004 11:03:16 +0100

```It is not what was asked for but you
could hold the same information in another
way:

collapse (sum) revenue, by(customer)
gen mult999 = floor(revenue/999)
gen surplus = mod(revenue,999)

There doesn't seem any special virtue
in holding lots of 999s with arbitrary
identifiers.

Nick
n.j.cox@durham.ac.uk

Phil Schumm
>
> At 5:29 PM -0400 9/27/04, MITRA PINAKI  (MAR1PXM) wrote:
> >     I have a dataset on multiple customer's revenue
> information like the
> >following:
> >
> >Id    customer    revenue
> >Z11    xx1         y1
> >Z21    xx1         y2
> >Z31    xx1         y3
> >Z12    xx2         k1
> >Z22    xx2         k2
> >Z32    xx2         k3  and so on.
> >
> >I need to create a new data based on the following criteria.
> If the sum of
> >revenue by customer is <=999, I need to write
> >Customer    revenue
> >xx1         sum of y
> >xx2         sum of k
> >
> >But if the sum of revenue by customer is >999 and <=1998, I
> need to write
> >Id     Customer    revenue
> >001    xx1         999
> >002    xx1         sum of y-999
> >001    xx2         999
> >002    xx2         sum of k-999
> >
> >and if the sum of revenue by customer is >1998 and <=2997, I
> need to write
> >Id     Customer    revenue
> >001    xx1         999
> >002    xx1         999
> >003    xx1         sum of y-1998
> >001    xx2         999
> >002    xx2         999
> >003    xx2         sum of k-1998
> >
> >Can anyone please provide me any help on this?
>
>
> Here's one alternative:
>
>
> collapse (sum) revenue, by(customer)
> expand ceil(revenue/999)
> bys customer: replace revenue = ((_n==_N)*mod(revenue,999)) +
> ((_n!=_N)*999) if 999<revenue
>
>
> Briefly, -collapse- computes the total revenue for each customer,
> resulting in a dataset with one record per customer.  -expand- then
> creates the additional records (e.g., if a customer's total is 2000,
> this will create two additional records for a total of 3).  Finally,
> the last line replaces revenue either with the value 999 (all records
> but the last one for each customer) or with the remainder once the
> multiples of 999 are accounted for (last record for each customer).
> If you wish, you could then generate your ids for each record
> (consecutive integers within customer in leading-zero format) using
> the following:
>
>
> gsort + customer - revenue
> by customer: gen id = string(_n,"%03.0f")
>
>
> Note that with this solution, records with missing values for revenue
> in the original dataset will essentially be ignored.  However, if a
> customer has nothing but missing values in the original dataset, the
> resulting dataset will contain a single record with a revenue of zero
> (this is because -collapse- treats missing values as zeros).  If this
> is not what you want, you'll need to modify the code accordingly.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```