Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: RE: summarize by different levels/groups with -egen- ?

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: RE: summarize by different levels/groups with -egen- ? Date Fri, 11 Jan 2013 12:46:17 +0000

```You don't need a dummy or indicator variable. Assuming that -pathogen-
is a string variable,

... mean(pathogen == "H")

will work fine as the -mean()- function of -egen- takes expressions.
If it's a numeric variable, the same principle applies, but you need a
different expression.

Nick

On Fri, Jan 11, 2013 at 12:01 PM, Lovisa Persson

> First create a dummy variable for each pathogen, pathogeni.
> Then generate the mean for each class and each pathogen(i) by writing:
>
> egen meanpathogeni=mean(pathogeni), by(class)
>
> every class that now has a certain pathogen in it will have a value of
> meanpathogeni higher than zero, and every class that do not have a certain
> pathogen in it will have a value of zero.
> The observation value will be the same within classes, which is the mean
> number of the pathogen in this class.
>
> So now you generate a new dummy variable that equals 1 if the value of
> meanpathogeni is higher than one.
> Now each class will have the same observation value which will be 1 or 0
> depending on whether this class had at least one observation of this
> particular pathogen in it.

Patricia Biedermann

> I want to summarize following:
>
> School          Class           Pathogen
> A                       A1                      H
> A                       A1                      T
> A                       A1                      H
> A                       A2                      S
> A                       A2                      H
> A                       A3                      K
> A                       A3                      I
> B                       B1                      S
> B                       B1                      T
> B                       B2                      H
>
> I've visited different classes in different schools. In each class I checked
> if the children were infected with some kind of pathogen.
> -       I found e.g that in class A1 two children were infected with
> pathogen H.
> -       Now, I want to summarize that I just found pathogen H in class A1
> WITHOUT the actual amount of pathogen itself (2 times in this case);
> Basically "Was pathogen H found in class A1" = yes or no; Finally, the
> information should be presented at school level. ("How many classes in
> school A pathogen H was found?)
>
> So far I tried egen, bysort / =_n==N and commands. I also created dummy
> variables for each pathogen.  It never worked out the right way.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```