Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

 From Nick Cox To "'statalist@hsphsun2.harvard.edu'" Subject RE: st: RE: correct egen command to calculate means - please help Date Wed, 18 May 2011 19:30:13 +0100

```Here's #2.

Set up dopey example

clear
set obs 4
egen id = seq(), to(2)
gen parent = 3 - id
gen y = 10 * _n

. l

+------------------+
| id   parent    y |
|------------------|
1. |  1        2   10 |
2. |  2        1   20 |
3. |  1        2   30 |
4. |  2        1   40 |
+------------------+

. save data
file data.dta saved

Get the means and rename

. collapse y, by(id)

. rename y mean

. rename id parent

Merge back

. merge 1:m parent using data
parent was byte now float

Result                           # of obs.
-----------------------------------------
not matched                             0
matched                                 4  (_merge==3)
-----------------------------------------

. list

+---------------------------------------+
| parent   mean   id    y        _merge |
|---------------------------------------|
1. |      1     20    2   40   matched (3) |
2. |      2     30    1   30   matched (3) |
3. |      1     20    2   20   matched (3) |
4. |      2     30    1   10   matched (3) |
+---------------------------------------+

Nick
n.j.cox@durham.ac.uk

Nick Cox

There are various ways to do it. Here's one:

egen group = group(ID)
su group, meanonly

gen mean = .

qui forval i = 1/`r(max)' {
su ID if group == `i', meanonly
local ID = r(mean)
su Value if group == `i', meanonly
replace mean = r(mean) if ParentID == `ID'
}

Another would to -collapse- to a set of means on ID and then -merge- back matching with ParentID.

Nick
n.j.cox@durham.ac.uk

Ophelie Desmarais

Let's say I have this dataset:

UniqueID          ID                        ParentID             Value
1                     1                                                   10
2                     1                                                    20
3                     2                          1                        5
4                     2                          2                        5

Basically, I want to add another variable (say "mean_value") that
calculates the mean for (10+20)/2 and assigns the value to UniqueID=3
(since UniqueID=3's ParentID is 1 and this links to UniqueID=1 and
UniqueID=2's ID variable). For UniqueID=4, similarly, the mean_value
would be = 5 (the average value of the observations where
ID=ParentID=3).

egen mean_value=mean(Value), by(ParentID) would give me 5 and 5 for
UniqueID=3 and UniqueID=4, which is not what I try to get.

o.

On 18 May 2011 19:44, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> I don't understand your pseudocode. It looks like
>
> Y = mean(X) where A = B
>
> to me.
>
> Nor is it "obvious" that
>
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry)
>
> is wrong. From your word description it sounds exactly right to me!
>
> Please give a worked example of what you want for a minimal dataset.
>
> Note that explanations that assume proficiency in Excel _and_ Stata catch a much smaller set of people than explanations that assume proficiency in Stata.
>
> (Can we assume that income == revenue?)
>
> Nick
> n.j.cox@durham.ac.uk
>
> Ophelie Desmarais
>
> I have tried to find an answer for this on the web but to no avail.
>
> Simplified, I have following dataset where the unit of observation is
> the industry (IndustryID) with the respective revenue.
>
> IndustryID, ParentIndustryID, Revenue
>
>
> The ParentIndustryID links to IndustryID and reflects the hierachical
> structure of the industries. For each industry, I want to add a
> variable reflecting the mean income of the ParentIndustryID. How do I
> do this in Stata? In Excel, this would be simply using AVERAGEIFS.
> egen MeanRevenueParent=mean(Revenue), by(ParentIndustry) obviously
> does not give the right mean. In terms of pseudo code, this should be
> something like:
>
> MeanRevenueParent = Mean(Revenue) WHERE ParentIndustry=IndustryID
>
> Could anyone help me out? I don't want to switch around with Excel all the time.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```