Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: summarize by different levels/groups with -egen- ?

From   Patricia Biedermann <>
Subject   Re: st: RE: summarize by different levels/groups with -egen- ?
Date   Fri, 11 Jan 2013 17:39:59 +0100

Thank you Lovisa & Nick.
I've tried your commands, but it seems not to work out the way I want
to have it. (pathogen is a string variable).

The issue is that, when I creat the dummy variable in the end (as
described by Lovisa) I will get for each H in one class a "1". When I
further summarize it, I have the total amount of H. But I want to have
a total amount of classes, who are affected with H (regardless how
many children itself were affected by the pathogen).

Class         Pathogen
A1                H
A1                S
A1                T
A2                S
A2                K
A3                H
A3                D
B1                H
B1                S                  0

Finally --> 3 (out of 4) classes are affected by "H". (I don't care
about how many individuals in one class!).

Maybe I've to think about it and approach it differently.

On Fri, Jan 11, 2013 at 1:46 PM, Nick Cox <> wrote:
> You don't need a dummy or indicator variable. Assuming that -pathogen-
> is a string variable,
> ... mean(pathogen == "H")
> will work fine as the -mean()- function of -egen- takes expressions.
> If it's a numeric variable, the same principle applies, but you need a
> different expression.
> Nick
> On Fri, Jan 11, 2013 at 12:01 PM, Lovisa Persson
> <> wrote:
>> First create a dummy variable for each pathogen, pathogeni.
>> Then generate the mean for each class and each pathogen(i) by writing:
>> egen meanpathogeni=mean(pathogeni), by(class)
>> every class that now has a certain pathogen in it will have a value of
>> meanpathogeni higher than zero, and every class that do not have a certain
>> pathogen in it will have a value of zero.
>> The observation value will be the same within classes, which is the mean
>> number of the pathogen in this class.
>> So now you generate a new dummy variable that equals 1 if the value of
>> meanpathogeni is higher than one.
>> Now each class will have the same observation value which will be 1 or 0
>> depending on whether this class had at least one observation of this
>> particular pathogen in it.
> Patricia Biedermann
>> I want to summarize following:
>> School          Class           Pathogen
>> A                       A1                      H
>> A                       A1                      T
>> A                       A1                      H
>> A                       A2                      S
>> A                       A2                      H
>> A                       A3                      K
>> A                       A3                      I
>> B                       B1                      S
>> B                       B1                      T
>> B                       B2                      H
>> I've visited different classes in different schools. In each class I checked
>> if the children were infected with some kind of pathogen.
>> -       I found e.g that in class A1 two children were infected with
>> pathogen H.
>> -       Now, I want to summarize that I just found pathogen H in class A1
>> WITHOUT the actual amount of pathogen itself (2 times in this case);
>> Basically "Was pathogen H found in class A1" = yes or no; Finally, the
>> information should be presented at school level. ("How many classes in
>> school A pathogen H was found?)
>> So far I tried egen, bysort / =_n==N and commands. I also created dummy
>> variables for each pathogen.  It never worked out the right way.
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index