Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Advise withdrawn -- data manipulation


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Advise withdrawn -- data manipulation
Date   Thu, 10 Oct 2002 20:22:46 +0100

The original question

> > > > I have a dataset which has customer's payment amount by month
> > > > by year. Month
> > > > ranges from 01 to 12 for years 2000 & 2001 and from 01 to 09
> > > > for the 2002. But
> > > > all customers don't have data for each month. The dataset
> > > > looks like the
> > > > following.
> > > >
> > > > customer  month  year  amount
> > > > x1         01    2001   50.45
> > > > x1         03    2001   60.00
> > > > x2         04    2001   70.00
> > > > x2         06    2001   80.00
> > > >
> > > > I would like to create a data set where each customer 
> will have 12
> > > > observations for years 2000 & 2001 and 9 obs. for 2002, and
> > > > amount will be
> > > > zero for the months they don't have any original data. I
> > > > tried with couple of
> > > > different ways, but didn't work. Could anyone please help me?

Nick Winter
> 
> Oops.
> 
> I misread the question.  There are clearly better ways to 
> do this, than
> to use -reshape-.

Not so fast! I don't see it as that clearcut. 

This is a nice problem, and there are points about 
Stata technique which make it of wider interest. 

As I write, three solutions have been 
proposed, here summarised in order of 
first posting, and with second thoughts
written in. 

1. Nick Winter
============== 

-reshape wide- followed by -reshape long-. 

This is a good general procedure. Other
applications abound. 

It is not going to fill in all gaps. 
Empirically, my guess is that is 
not a problem. 

If it is, then a little preparation
will fix the problem. It is necessary
and sufficient that all times be 
present for at least one customer. 

2. Nick Cox
===========

-fillin-. 

-fillin- is optimised for this one 
problem. It does nothing else. Perhaps 
you never heard of it. There is always
a problem learning of and remembering
tools you use only once in a while. 
It is better to learn about more 
general tools. 

It is not going to fill in all gaps. 
Empirically, my guess is that is 
not a problem. Same comment 
as above (and one method was
proposed). 

3. Tao Jiang 
============

-merge- with a complete data set. 

No code presented, but in principle 
this sounds elegant. You could 
-contract- on -customer- and then 
-expand- and create a time variable. 

-merge- is a very good general procedure. 
Other applications abound. 

Naturally, getting one solution 
that works is enough. But there is 
a lot of evidence that different 
Stata users find different tools 
intuitive, so choose whatever 
appeals. 

Nick 
n.j.cox@durham.ac.uk 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index