Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: questions about duplicate observations


From   n j cox <[email protected]>
To   [email protected]
Subject   Re: RE: st: questions about duplicate observations
Date   Tue, 27 May 2008 17:12:46 +0100

You may not need -collapse- at all.

keep if max == offering_amt

may be sufficient once you have calculated all your new variables.

But watch out for ties.

You can always sort in descending order. Just negate the variable in
question first.

gen negfoo = -foo
sort negfoo
bysort frog negfoo : ...

Nick
[email protected]

"Wen Xia Ge" <[email protected]>

Thanks for your suggestion. Your suggestion works well for the second
approach. But I still do not figure out how to use -collapse- to get the
dataset described in the first approach. That is, for firms with
multiple bond issues in a year, I just want to keep the issue with the
largest offering_amt (firms with single bond issue will remain in the
dataset). I tried the following:

 bysort yeara cnum (offering_amt) : gen max = offering_amt[_N]
 collapse max bond_yield maturity (and some other variables which are
not listed here), by(yeara cnum)

It will give the means of the listed variables. In this case, max is OK
(it is the largest offering_amt), but I want to keep the orignial
bond_yield, maturity etc associated with the issue with the largest
offering_amt. e.g., for issue 7, 8 and 9, issue 7 and 8 should be
removed, and just variables associated with issue 9 will be remained in
the dataset.

I tried to use -duplicates drop-, but I cannot sort data in descending
order, because the error message says -gsort- cannot be combined with
-by-.


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index