[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

# st: RE: data manipulation

 From "Nick Cox" To Subject st: RE: data manipulation Date Thu, 10 Oct 2002 19:46:00 +0100

```pmitra@mindspring.com

> I have a dataset which has customer's payment amount by
> month by year. Month
> ranges from 01 to 12 for years 2000 & 2001 and from 01 to
> 09 for the 2002. But
> all customers don't have data for each month. The dataset
> looks like the
> following.
>
> customer  month  year  amount
> x1         01    2001   50.45
> x1         03    2001   60.00
> x2         04    2001   70.00
> x2         06    2001   80.00
>
> I would like to create a data set where each customer will have 12
> observations for years 2000 & 2001 and 9 obs. for 2002, and
> amount will be
> zero for the months they don't have any original data. I
> tried with couple of
> different ways, but didn't work. Could anyone please help me?
>

First you can create

. gen time = 12 * (year - 2000 ) + month

which runs 1 ... 33 for January 2000 ... September 2002.

Then you can

. tabulate time

and see whether all values 1 ... 33 are represented
in your data. If they aren't, you should make sure
that they are. One way to do this is to add a
pseudocustomer, whose records are complete,

. local N = _N + 33
. set obs `N'
. replace customer = "Stata Corporation" in -33/l
. replace time = _N - _n + 1 in -33/l

and then

. fillin customer time
. replace amount = 0 if missing(amount)
. drop if customer == "Stata Corporation"

The pseudocustomer should of course not have
the same name as an actual customer.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

 © Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index