Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

"Martin Weiss" <martin.weiss1@gmx.de> |

st: AW: AW: RE: AW: RE: Transposing datasets |

Mon, 2 Aug 2010 15:28:58 +0200 |

Nick himself advocated "first principles" in http://www.stata.com/statalist/archive/2010-05/msg01165.html, btw...

HTH
Martin

"( Also, I learned something about using subinstr() in the rename command from your post, thanks )"

Cheers! I love to work from first principles whenever possible, so my use of -subinstr()- was not intended to detract from the appeal of NJC`s -findit renvars-...

HTH
Martin I misread in the OP what gvkey was...you're right, there's no need for "id". My post was delayed--I had sent my email before yours came through-- so I hadn't intended for mine to be some kind of comment/alternate to your post, as yours was clearly better. ( Also, I learned something about using subinstr() in the rename command from your post, thanks )

~ Eric Why do we need another "id" variable? In Eric`s code, it is created via
*************
g id = _n
*************

Is "gvkey" not supposed to be the "id" variable?

HTH
Martin You can't name a new variable with a number as the first character (e.g., "31jun1980"). So, -tostring- your datadate var first:

*********!
clear
inp gvkey str20(datadate) mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
end

**set up data**
g datadate2 = date(datadate, "DMY")
format datadate2 %td
drop datadate
rename datadate2 datadate

//1. make date a string var//
tostring datadate, force replace u

//2. reshape wide using datadate//
g id = _n
reshape wide mcap_sum, i(id) j(datadate) string
ds mcap_*

//3. move all obs for a gvkey to one line//
foreach v in `r(varlist)' {
bys gvkey: egen `v'2 = max(`v')
drop `v'
rename `v'2 `v'
}
by gvkey: g o = 1 ==_n
keep if o==1
drop o
*********!

~ Eric

Hi guys,

I have a dataset with about 32000 observations, which is in long format (see structure below). gvkey is the identifier for a firm (about 600 different firms), datadate is the monthend value between 2002 and 2010, which of course repeats in the dataset (again, long format) and mcap_sum is my observation, which is different for each month and gvkey.

gvkey datadate mcap_sum
212782 30jun2005 4946.9
212782 31jul2005 5042.1
212782 31aug2005 5145
212782 30sep2005 5302.5
212782 31oct2005 5253.5
212782 30nov2005 5642.7
212782 31dec2005 6230
etc... Well, I would like to transpose my dataset so it shows each month as a variable and the observations are mcap_sums. My tries with reshape failed miserably. (xpose wont work because I still want to keep mcap_sum as an observation).. Does anybody has a suggestion to solve this quickly?

gvkey 31dec2005 30nov2005 31oct2005
212782 6230 5642.7 5253.5
...........

Best,
Kaspar 