Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: data manipulation


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: data manipulation
Date   Thu, 10 Oct 2002 19:46:00 +0100

[email protected]

> I have a dataset which has customer's payment amount by 
> month by year. Month
> ranges from 01 to 12 for years 2000 & 2001 and from 01 to 
> 09 for the 2002. But
> all customers don't have data for each month. The dataset 
> looks like the
> following.
> 
> customer  month  year  amount
> x1         01    2001   50.45
> x1         03    2001   60.00
> x2         04    2001   70.00
> x2         06    2001   80.00
> 
> I would like to create a data set where each customer will have 12
> observations for years 2000 & 2001 and 9 obs. for 2002, and 
> amount will be
> zero for the months they don't have any original data. I 
> tried with couple of
> different ways, but didn't work. Could anyone please help me?
> 

First you can create 

. gen time = 12 * (year - 2000 ) + month 

which runs 1 ... 33 for January 2000 ... September 2002.

Then you can 

. tabulate time 

and see whether all values 1 ... 33 are represented 
in your data. If they aren't, you should make sure
that they are. One way to do this is to add a 
pseudocustomer, whose records are complete, 

. local N = _N + 33 
. set obs `N' 
. replace customer = "Stata Corporation" in -33/l 
. replace time = _N - _n + 1 in -33/l 

and then 

. fillin customer time 
. replace amount = 0 if missing(amount) 
. drop if customer == "Stata Corporation" 

The pseudocustomer should of course not have 
the same name as an actual customer. 

Nick 
[email protected] 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index