Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: questions about duplicate observations


From   "Wen Xia Ge" <wenxia.ge@mail.mcgill.ca>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: questions about duplicate observations
Date   Mon, 26 May 2008 15:21:44 -0400

Dear all,
 
I have a dataset that contains thousands of firms with bond issues from
1980-2004. Some firms have single bond issue in a given year, while some
firms have multiple bond issues in a given year. For example, the
dataset looks like this (I have more variables than listed here):
 
issue_id  issuer_id  issuer_cusip  offering_year  offering_amt
bond_yield  maturity	
1	           3	000361	    1989	     ###
###	       ###	
2	           3      000361	    1993	     ###
###	       ###	
3	           4	00077D	    1994	     ###
###	       ###	
4	           4	00077D	    1994	     ###
###	       ###	
5	           4	00077T	    1993	     ###
###	       ###	
6	           4	00077T	    1993	     ###
###	       ###	
7	           6	000800	    1982	   700000
###	       ###	
8	           6	000800	    1982	   680000
###	       ###	
9	           6	000800	    1982	   800000
###	       ###	
10	           6	000800	    1984	     ###
###	       ###	
 
 
For firms with single bond issue in a year, they will be kept in the
dataset as it is. I want to use two approaches to handle firms with
multiple bond issues in a given year: 
 
First, I want to keep the issue with the largest offering_amt in the
dataset (for example, for company 000800, drop issue 7 and 8, keep issue
9, for year 1982, and keep issue 10 for year 1984), so that in the
resulting panel data, in a given year, a company can only show up at
most once (with the largest offering amount);

Second, I want to calculate weighted average bond_yield (using
offering_amt as the weights), average offering_amt and average maturity,
and save it as a new observation while dropping original multiple bond
issues. For example, in the resulting panel data, there will be one
observation for 000800 in 1982, the offering_amt will take the average
of 700000, 680000, and 800000, and the bond_yiled will take the weighted
value of the three bond_yield on issue 7, 8 and 9. 

I tried the following command:
sort yeara cnum offering_amt
collapse (last) offering_amt, by(yeara cnum)

And:

sort yeara cnum 
collapse (max) offering_amt, by(yeara cnum)

The resulting dataset contains only three variables: yeara cnum and
offering_amt, but I need to keep other variables. 
 
Could anyone please show me how to get this done in Stata?
 
Your help would be greatly appreciated!


Wenxia

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index