Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: correct egen command to calculate means - please help


From   Nick Cox <n.j.cox@durham.ac.uk>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: correct egen command to calculate means - please help
Date   Wed, 18 May 2011 19:16:44 +0100

There are various ways to do it. Here's one:  

egen group = group(ID)
su group, meanonly 

gen mean = . 

qui forval i = 1/`r(max)' { 
	su ID if group == `i', meanonly 
	local ID = r(mean) 
	su Value if group == `i', meanonly
	replace mean = r(mean) if ParentID == `ID' 
}
	
Another would to -collapse- to a set of means on ID and then -merge- back matching with ParentID. 

Nick 
n.j.cox@durham.ac.uk 

Ophelie Desmarais

Thanks a lot for the reply. Sorry for not providing more information:

Let's say I have this dataset:

UniqueID          ID                        ParentID             Value
1                     1                                                   10
2                     1                                                    20
3                     2                          1                        5
4                     2                          2                        5

Basically, I want to add another variable (say "mean_value") that
calculates the mean for (10+20)/2 and assigns the value to UniqueID=3
(since UniqueID=3's ParentID is 1 and this links to UniqueID=1 and
UniqueID=2's ID variable). For UniqueID=4, similarly, the mean_value
would be = 5 (the average value of the observations where
ID=ParentID=3).

egen mean_value=mean(Value), by(ParentID) would give me 5 and 5 for
UniqueID=3 and UniqueID=4, which is not what I try to get.

Again, thanks for your kind answer.

o.


On 18 May 2011 19:44, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I don't understand your pseudocode. It looks like
>
> Y = mean(X) where A = B
>
> to me.
>
> Nor is it "obvious" that
>
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry)
>
> is wrong. From your word description it sounds exactly right to me!
>
> Please give a worked example of what you want for a minimal dataset.
>
> Note that explanations that assume proficiency in Excel _and_ Stata catch a much smaller set of people than explanations that assume proficiency in Stata.
>
> (Can we assume that income == revenue?)
>
> Nick
> n.j.cox@durham.ac.uk
>
> Ophelie Desmarais
>
> I have tried to find an answer for this on the web but to no avail.
>
> Simplified, I have following dataset where the unit of observation is
> the industry (IndustryID) with the respective revenue.
>
> IndustryID, ParentIndustryID, Revenue
>
>
> The ParentIndustryID links to IndustryID and reflects the hierachical
> structure of the industries. For each industry, I want to add a
> variable reflecting the mean income of the ParentIndustryID. How do I
> do this in Stata? In Excel, this would be simply using AVERAGEIFS.
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry) obviously
> does not give the right mean. In terms of pseudo code, this should be
> something like:
>
> MeanRevenueParent = Mean(Revenue) WHERE ParentIndustry=IndustryID
>
> Could anyone help me out? I don't want to switch around with Excel all the time.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index