Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: -foreach- syntax


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: -foreach- syntax
Date   Sun, 30 Oct 2005 23:12:53 -0000

Recently Terra Curtis has been posting problems to 
do with a particular dataset. Last Thursday I repeated
a comment I made earlier: 

====================== 24 October 2005 
Also I guess wildly that your data structure is quite 
the wrong way round to do much easily in Stata. Solve 
this problem, and other problems will be just as messy. 
Consider reshaping to panel data structure, and much 
may be a lot easier. 
======================

I am now going to expand on this guess. 

Terra works for an outfit called Cambridge Finance. 
It looks as if the data in question are organised 
like this:  

The observations are companies, or something similar
of interest to people in Cambridge Finance. 

The variables are various properties for 58 quarters, 
from the first quarter of 1989 through the last of 2005. 

Thus something called "lc" is represented by 58 
variables and something called "dolq" by another 58 
variables. The last problem Terra posted was to 
calculate the ratio of lc and dolq, and so 
another 58 variables were needed. 

There may well be much, much more, each property 
being represented by another 58 variables. And
every time Terra wants some other derived property
another 58 variables will be needed. 

Digging the same hole deeper like this can only make life
more painful, even if Stata's limit on # of variables
never bites. 

The remedy here is to -reshape-. [D] reshape 
gives a detailed account. Yet more details 
and examples are in the FAQ 

http://www.stata.com/support/faqs/data/reshape3.html

In fact, what I am going to suggest is basically
just copied from that FAQ. 

We don't need to know more gory details to 
suggest a strategy. Suppose also that there is 
some identifier variable, here called -id-, so 
that -ds- will show something like 

. ds
lcq1y1989   dolq1y1991  lcq2y1993   dolq2y1995  lcq3y1997   dolq3y2001  lcq4y2003
dolq1y1989  lcq2y1991   dolq2y1993  lcq3y1995   dolq3y1997  lcq4y2001   dolq4y2003
lcq2y1989   dolq2y1991  lcq3y1993   dolq3y1995  lcq4y1997   dolq4y2001  lcq1y2004
dolq2y1989  lcq3y1991   dolq3y1993  lcq4y1995   dolq4y1997  lcq1y2002   dolq1y2004
lcq3y1989   dolq3y1991  lcq4y1993   dolq4y1995  lcq1y1998   dolq1y2002  lcq2y2004
dolq3y1989  lcq4y1991   dolq4y1993  lcq1y1996   dolq1y1998  lcq2y2002   dolq2y2004
lcq4y1989   dolq4y1991  lcq1y1994   dolq1y1996  lcq2y1998   dolq2y2002  lcq3y2004
dolq4y1989  lcq1y1992   dolq1y1994  lcq2y1996   dolq2y1998  lcq3y2002   dolq3y2004
lcq1y1990   dolq1y1992  lcq2y1994   dolq2y1996  lcq3y1999   dolq3y2002  lcq4y2004
dolq1y1990  lcq2y1992   dolq2y1994  lcq3y1996   dolq3y1999  lcq4y2002   dolq4y2004
lcq2y1990   dolq2y1992  lcq3y1994   dolq3y1996  lcq4y1999   dolq4y2002  lcq1y2005
dolq2y1990  lcq3y1992   dolq3y1994  lcq4y1996   dolq4y1999  lcq1y2003   dolq1y2005
lcq3y1990   dolq3y1992  lcq4y1994   dolq4y1996  lcq1y2000   dolq1y2003  lcq2y2005
dolq3y1990  lcq4y1992   dolq4y1994  lcq1y1997   dolq1y2000  lcq2y2003   dolq2y2005
lcq4y1990   dolq4y1992  lcq1y1995   dolq1y1997  lcq2y2001   dolq2y2003  id
dolq4y1990  lcq1y1993   dolq1y1995  lcq2y1997   dolq2y2001  lcq3y2003
lcq1y1991   dolq1y1993  lcq2y1995   dolq2y1997  lcq3y2001   dolq3y2003

Focus on one set of 58, say lc* 

. ds lc*
lcq1y1989  lcq1y1991  lcq1y1993  lcq1y1995  lcq1y1997  lcq1y2000  lcq1y2003  lcq1y2005
lcq2y1989  lcq2y1991  lcq2y1993  lcq2y1995  lcq2y1997  lcq2y2001  lcq2y2003  lcq2y2005
lcq3y1989  lcq3y1991  lcq3y1993  lcq3y1995  lcq3y1997  lcq3y2001  lcq3y2003
lcq4y1989  lcq4y1991  lcq4y1993  lcq4y1995  lcq4y1997  lcq4y2001  lcq4y2003
lcq1y1990  lcq1y1992  lcq1y1994  lcq1y1996  lcq1y1998  lcq1y2002  lcq1y2004
lcq2y1990  lcq2y1992  lcq2y1994  lcq2y1996  lcq2y1998  lcq2y2002  lcq2y2004
lcq3y1990  lcq3y1992  lcq3y1994  lcq3y1996  lcq3y1999  lcq3y2002  lcq3y2004
lcq4y1990  lcq4y1992  lcq4y1994  lcq4y1996  lcq4y1999  lcq4y2002  lcq4y2004

-ds- leaves the list of names in its wake, so we can work on that to 
get the suffixes, "q1y1989" on: 

. local lc "`r(varlist)'" 

. local suffix : subinstr local lc "lc" "", all

. di "`suffix'" 
q1y1989 q2y1989 q3y1989 q4y1989 q1y1990 q2y1990 q3y1990 q4y1990 q1y1991 q2y1991 q3y1991 q4y19
> 91 q1y1992 q2y1992 q3y1992 q4y1992 q1y1993 q2y1993 q3y1993 q4y1993 q1y1994 q2y1994 q3y1994 
> q4y1994 q1y1995 q2y1995 q3y1995 q4y1995 q1y1996 q2y1996 q3y1996 q4y1996 q1y1997 q2y1997 q3y
> 1997 q4y1997 q1y1998 q2y1998 q3y1999 q4y1999 q1y2000 q2y2001 q3y2001 q4y2001 q1y2002 q2y200
> 2 q3y2002 q4y2002 q1y2003 q2y2003 q3y2003 q4y2003 q1y2004 q2y2004 q3y2004 q4y2004 q1y2005 q
> 2y2005

Now we work the other way, to get the stubs: 

. unab stub : *q1y1989

. di "`stub'" 
lcq1y1989 dolq1y1989

In our case there are only two stubs, but the problem would be 
just the same if there were several more: 

. local stub : subinstr local stub "q1y1989" "", all

. di "`stub'" 
lc dol

Now we have enough to do the -reshape-: 

. reshape long `stub' , i(id) string
(note: j = q1y1989 q1y1990 q1y1991 q1y1992 q1y1993 q1y1994 q1y1995 q1y1996 q1y1997 q1y1998 q1
> y2000 q1y2002 q1y2003 q1y2004 q1y2005 q2y1989 q2y1990 q2y1991 q2y1992 q2y1993 q2y1994 q2y19
> 95 q2y1996 q2y1997 q2y1998 q2y2001 q2y2002 q2y2003 q2y2004 q2y2005 q3y1989 q3y1990 q3y1991 
> q3y1992 q3y1993 q3y1994 q3y1995 q3y1996 q3y1997 q3y1999 q3y2001 q3y2002 q3y2003 q3y2004 q4y
> 1989 q4y1990 q4y1991 q4y1992 q4y1993 q4y1994 q4y1995 q4y1996 q4y1997 q4y1999 q4y2001 q4y200
> 2 q4y2003 q4y2004)

Data                               wide   ->   long
-----------------------------------------------------------------------------
Number of obs.                       10   ->     580
Number of variables                 117   ->       4
j variable (58 values)                    ->   _j
xij variables:
      lcq1y1989 lcq1y1990 ... lcq4y2004   ->   lc
   dolq1y1989 dolq1y1990 ... dolq4y2004   ->   dol
-----------------------------------------------------------------------------

The dates are not yet usable: 

. levelsof _j, clean
q1y1989 q1y1990 q1y1991 q1y1992 q1y1993 q1y1994 q1y1995 q1y1996 q1y1997 q1y1998 q1y2000 q1y20
> 02 q1y2003 q1y2004 q1y2005 q2y1989 q2y1990 q2y1991 q2y1992 q2y1993 q2y1994 q2y1995 q2y1996 
> q2y1997 q2y1998 q2y2001 q2y2002 q2y2003 q2y2004 q2y2005 q3y1989 q3y1990 q3y1991 q3y1992 q3y
> 1993 q3y1994 q3y1995 q3y1996 q3y1997 q3y1999 q3y2001 q3y2002 q3y2003 q3y2004 q4y1989 q4y199
> 0 q4y1991 q4y1992 q4y1993 q4y1994 q4y1995 q4y1996 q4y1997 q4y1999 q4y2001 q4y2002 q4y2003 q
> 4y2004

But we just extract the relevant bits, convert from characters to numbers, and we 
then have a Stata date variable. 

. gen time = yq(real(substr(_j,-4,4)), real(substr(_j,2,1))) 

. levelsof time
116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 1
> 39 140 141 142 143 144 145 146 147 148 149 150 151 152 153 158 159 160 165 166 167 168 169 
> 170 171 172 173 174 175 176 177 178 179 180 181

. format time %tqCy_q

. l time in 1/10

     +--------+
     |   time |
     |--------|
  1. | 1989 1 |
  2. | 1990 1 |
  3. | 1991 1 |
  4. | 1992 1 |
  5. | 1993 1 |
     |--------|
  6. | 1994 1 |
  7. | 1995 1 |
  8. | 1996 1 |
  9. | 1997 1 |
 10. | 1998 1 |
     +--------+

. tsset id time 

Now the ratio calculation is laughably simple: 

. gen ratio = lc/dolq 

Nick 
n.j.cox@durham.ac.uk 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index