Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Collapsing Over Limited Set


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Collapsing Over Limited Set
Date   Thu, 15 Sep 2005 21:26:04 +0100

As I understand it, this is not a collapsing problem 
at all, as you want as many observations at the 
end as you start with. 

One way to approach this is to notice that 

sum of all others = sum of all - value for this 

which suggests the following line of attack:  

bysort product: gen ave_others = sum(markup) 
by product: replace ave_others = (ave_others[_N] - markup) / (_N - 1) 

This would need some tweaking if markup was ever missing: 

gen byte touse = markup < . 
bysort touse product: gen ave_others = sum(markup) if touse 
by touse product: replace ave_others = (ave_others[_N] - markup) / (_N - 1) 

Some searching would have pointed to a ragbag of tricks in 

How do I create variables summarizing for each individual properties of the 
other members of a group? 

http://www.stata.com/support/faqs/data/members.html

Nick
n.j.cox@durham.ac.uk 

Jason Hwang
 
> I'm trying to generate a dataset of statistics by collapsing another
> dataset, but each of the id's I'm collapsing by, I want to use every
> observation in the dataset except the one for the id under 
> consideration.
> Would there be a way to do this?
> 
> For example, I have data of the form:
> 
> company	product markup
> 100	31	.3
> 100	55	.2
> 111	31	0
> 111	55	.1
> 120	31	.1
> 120	55	.1
> 
> Now I want to ask the question: for each company, calculate 
> the average
> markup of each product it produces, where the average is 
> taken over all
> companies that sell the product except the company itself. So 
> I want to
> end up with
> 
> company	product	averagemarkup
> 100	31	.05
> 100	55	.1
> 111	31	.2
> 111	55	.15
> 120	31	.15
> 120	55	.15
> 
> Obviously collapsing the data the standard way is not going 
> to do this. I
> need to do this for hundreds of thousands of observations (hundreds of
> companies and thousands of products) so am looking for a way 
> to do this that would be relatively quick. 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index