Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: correct egen command to calculate means - please help

From   Nick Cox <>
To   "''" <>
Subject   RE: st: RE: correct egen command to calculate means - please help
Date   Wed, 18 May 2011 19:16:44 +0100

There are various ways to do it. Here's one:  

egen group = group(ID)
su group, meanonly 

gen mean = . 

qui forval i = 1/`r(max)' { 
	su ID if group == `i', meanonly 
	local ID = r(mean) 
	su Value if group == `i', meanonly
	replace mean = r(mean) if ParentID == `ID' 
Another would to -collapse- to a set of means on ID and then -merge- back matching with ParentID. 


Ophelie Desmarais

Thanks a lot for the reply. Sorry for not providing more information:

Let's say I have this dataset:

UniqueID          ID                        ParentID             Value
1                     1                                                   10
2                     1                                                    20
3                     2                          1                        5
4                     2                          2                        5

Basically, I want to add another variable (say "mean_value") that
calculates the mean for (10+20)/2 and assigns the value to UniqueID=3
(since UniqueID=3's ParentID is 1 and this links to UniqueID=1 and
UniqueID=2's ID variable). For UniqueID=4, similarly, the mean_value
would be = 5 (the average value of the observations where

egen mean_value=mean(Value), by(ParentID) would give me 5 and 5 for
UniqueID=3 and UniqueID=4, which is not what I try to get.

Again, thanks for your kind answer.


On 18 May 2011 19:44, Nick Cox <> wrote:
> I don't understand your pseudocode. It looks like
> Y = mean(X) where A = B
> to me.
> Nor is it "obvious" that
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry)
> is wrong. From your word description it sounds exactly right to me!
> Please give a worked example of what you want for a minimal dataset.
> Note that explanations that assume proficiency in Excel _and_ Stata catch a much smaller set of people than explanations that assume proficiency in Stata.
> (Can we assume that income == revenue?)
> Nick
> Ophelie Desmarais
> I have tried to find an answer for this on the web but to no avail.
> Simplified, I have following dataset where the unit of observation is
> the industry (IndustryID) with the respective revenue.
> IndustryID, ParentIndustryID, Revenue
> The ParentIndustryID links to IndustryID and reflects the hierachical
> structure of the industries. For each industry, I want to add a
> variable reflecting the mean income of the ParentIndustryID. How do I
> do this in Stata? In Excel, this would be simply using AVERAGEIFS.
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry) obviously
> does not give the right mean. In terms of pseudo code, this should be
> something like:
> MeanRevenueParent = Mean(Revenue) WHERE ParentIndustry=IndustryID
> Could anyone help me out? I don't want to switch around with Excel all the time.

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index