Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: summarize by different levels/groups with -egen- ?

From   Nick Cox <>
Subject   Re: st: RE: summarize by different levels/groups with -egen- ?
Date   Fri, 11 Jan 2013 12:46:17 +0000

You don't need a dummy or indicator variable. Assuming that -pathogen-
is a string variable,

... mean(pathogen == "H")

will work fine as the -mean()- function of -egen- takes expressions.
If it's a numeric variable, the same principle applies, but you need a
different expression.


On Fri, Jan 11, 2013 at 12:01 PM, Lovisa Persson
<> wrote:

> First create a dummy variable for each pathogen, pathogeni.
> Then generate the mean for each class and each pathogen(i) by writing:
> egen meanpathogeni=mean(pathogeni), by(class)
> every class that now has a certain pathogen in it will have a value of
> meanpathogeni higher than zero, and every class that do not have a certain
> pathogen in it will have a value of zero.
> The observation value will be the same within classes, which is the mean
> number of the pathogen in this class.
> So now you generate a new dummy variable that equals 1 if the value of
> meanpathogeni is higher than one.
> Now each class will have the same observation value which will be 1 or 0
> depending on whether this class had at least one observation of this
> particular pathogen in it.

Patricia Biedermann

> I want to summarize following:
> School          Class           Pathogen
> A                       A1                      H
> A                       A1                      T
> A                       A1                      H
> A                       A2                      S
> A                       A2                      H
> A                       A3                      K
> A                       A3                      I
> B                       B1                      S
> B                       B1                      T
> B                       B2                      H
> I've visited different classes in different schools. In each class I checked
> if the children were infected with some kind of pathogen.
> -       I found e.g that in class A1 two children were infected with
> pathogen H.
> -       Now, I want to summarize that I just found pathogen H in class A1
> WITHOUT the actual amount of pathogen itself (2 times in this case);
> Basically "Was pathogen H found in class A1" = yes or no; Finally, the
> information should be presented at school level. ("How many classes in
> school A pathogen H was found?)
> So far I tried egen, bysort / =_n==N and commands. I also created dummy
> variables for each pathogen.  It never worked out the right way.
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index