# Re: st: Collapsing Over Limited Set

 From Nick Winter <[email protected]> To [email protected] Subject Re: st: Collapsing Over Limited Set Date Thu, 15 Sep 2005 15:47:58 -0400

I believe this will do it:

bysort product: egen n_p = count( !mi(markup) & !mi(company) )
by product: egen sum_p = sum( markup * !mi(company) )

bysort company product : gen n_cp = _N
by company product: egen sum_cp = sum(markup)

gen averagemarkup = (sum_p-sum_cp) / (n_p-n_cp)

bysort product: gen n_p = _N
by product: egen sum_p = sum(markup)

bysort company product : gen n_cp = _N
by company product: egen sum_cp = sum(markup)

gen averagemarkup = (sum_p-sum_cp) / (n_p-n_cp)

That is, get the sum and number of observations for each product overall, then get the sum and number of observations for each company-product pair...then subtract off these latter quantities from the former to calculate the average you want.

This will trip over missing values on company, I believe, because then the n_p and sum_p will be wrong.

--Nick Winter

At 03:11 PM 9/15/2005, you wrote:

```Hi there,

I would really appreciate someone's help with this question.

I'm trying to generate a dataset of statistics by collapsing another
dataset, but each of the id's I'm collapsing by, I want to use every
observation in the dataset except the one for the id under consideration.
Would there be a way to do this?

For example, I have data of the form:

company product markup
100     31      .3
100     55      .2
111     31      0
111     55      .1
120     31      .1
120     55      .1

Now I want to ask the question: for each company, calculate the average
markup of each product it produces, where the average is taken over all
companies that sell the product except the company itself. So I want to
end up with

company product averagemarkup
100     31      .05
100     55      .1
111     31      .2
111     55      .15
120     31      .15
120     55      .15

Obviously collapsing the data the standard way is not going to do this. I
need to do this for hundreds of thousands of observations (hundreds of
companies and thousands of products) so am looking for a way to do this
that would be relatively quick. I would be grateful for any suggestions.

Thanks very much.

Jason
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```
________________________________________________________
Nicholas J. G. Winter 607.255.8819 t
Assistant Professor 607.255.4530 f
Department of Government [email protected] e
Cornell University falcon.arts.cornell.edu/nw53 w
308 White Hall
Ithaca, NY 14853-4601

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/