Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: questions about duplicate observations


From   n j cox <n.j.cox@durham.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: questions about duplicate observations
Date   Tue, 27 May 2008 15:21:30 +0100

.
Use -by:- for your calculations. -collapse- only at the end. Recall that the mean of a constant is that constant.

e.g.

bysort yeara cnum (offering_amt) : gen max = offering_amt[_N]

by yeara cnum : gen num = sum(bond_yield * offering_amt)
by yeara cnum : gem den = sum(offering_amt)
by yeara cnum : gen avewt_bind_yield = num[_N] / den[_N]

Nick
n.j.cox@durham.ac.uk

Wen Xia Ge <wenxia.ge@mail.mcgill.ca>

I have a dataset that contains thousands of firms with bond issues from
1980-2004. Some firms have single bond issue in a given year, while some
firms have multiple bond issues in a given year. For example, the
dataset looks like this (I have more variables than listed here):

issue_id issuer_id issuer_cusip offering_year offering_amt
bond_yield maturity
1 3 000361 1989 ###
### ###
2 3 000361 1993 ###
### ###
3 4 00077D 1994 ###
### ###
4 4 00077D 1994 ###
### ###
5 4 00077T 1993 ###
### ###
6 4 00077T 1993 ###
### ###
7 6 000800 1982 700000
### ###
8 6 000800 1982 680000
### ###
9 6 000800 1982 800000
### ###
10 6 000800 1984 ###
### ###


For firms with single bond issue in a year, they will be kept in the
dataset as it is. I want to use two approaches to handle firms with
multiple bond issues in a given year:

First, I want to keep the issue with the largest offering_amt in the
dataset (for example, for company 000800, drop issue 7 and 8, keep issue
9, for year 1982, and keep issue 10 for year 1984), so that in the
resulting panel data, in a given year, a company can only show up at
most once (with the largest offering amount);

Second, I want to calculate weighted average bond_yield (using
offering_amt as the weights), average offering_amt and average maturity,
and save it as a new observation while dropping original multiple bond
issues. For example, in the resulting panel data, there will be one
observation for 000800 in 1982, the offering_amt will take the average
of 700000, 680000, and 800000, and the bond_yiled will take the weighted
value of the three bond_yield on issue 7, 8 and 9.

I tried the following command:
sort yeara cnum offering_amt
collapse (last) offering_amt, by(yeara cnum)

And:

sort yeara cnum
collapse (max) offering_amt, by(yeara cnum)

The resulting dataset contains only three variables: yeara cnum and
offering_amt, but I need to keep other variables.


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index