# Re: st: RE: creating variables using 'by' for subsets of records

 From John Westbury To statalist@hsphsun2.harvard.edu Subject Re: st: RE: creating variables using 'by' for subsets of records Date Tue, 23 Feb 2010 20:45:15 -0600

```Thanks much for the feedback.  Here is an example of what the data looks
like that I am using:

Individual region Indicator  A 1 0  B 1 1  C 2 1  D 2 1
I have encoded the regions and the ratio I am attempting to create would be
intuitively expressed as:
by region: count of indicator==1/count of individual.

I am trying to create a variable for the numerator by region (call it y) and
denominator by region (call it x) and then use gen ratio=y/x.
I can create a variable (x) for the denominator using; bys region: egen
x=count(Indicator).
I am having trouble creating a variable for the numerator.  I have attempted
to use bys region: egen y=count if Indicator==1 but receive an invalid
syntax error.  If someone has a suggestion on how to specify a variable for
a count of indicator==1 by region I would be very appreciative.

As an aside, is there a way to specify the variable y/x without specifying y
and x?

thanks

John

On Tue, Feb 23, 2010 at 2:29 PM, Martin Weiss <martin.weiss1@gmx.de> wrote:

>
> <>
>
> In the absence of example data, it is hard to give you advice. Look at this
> calculation of regional unemployment rates:
>
>
> *******
> clear*
>
> //10 regions
> set obs 10
> gen byte region=_n
>
> //50 indiv per region
> expand 50
> bys region: gen byte id=_n
> gen byte unemployed=runiform()>.9
>
> bys region: gen number=_N
> by region: egen numofunempl=total(unemployed)
>
> gen unemprate=numofunempl/number
> *******
>
> HTH
> Martin
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of John Westbury
> Sent: Dienstag, 23. Februar 2010 20:55
> To: statalist@hsphsun2.harvard.edu
> Subject: st: creating variables using 'by' for subsets of records
>
> Hello,
>
> I have records for individuals by geographic region and wish to aggregate
> the records for individuals to records for geographic regions.  I believe I
> should create variables for those regions using 'by'.  Ex: by Region gen x
> =
> argument for variable.  I am having difficulty with arguments for variable
> x.  For example I wish to create a region variable that expresses a ratio
> of
> count of indicator values for individuals in a region to a count of
> individuals in the region and am unsure how to code this.
>
> thanks
>
> John
>
