Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: RE: creating variables using 'by' for subsets of records

From	"Martin Weiss" <[email protected]>
To	<[email protected]>
Subject	AW: st: RE: creating variables using 'by' for subsets of records
Date	Wed, 24 Feb 2010 14:13:54 +0100

<> 

Several issues pose themselves here. A solution could be something like
this:

*************
clear*

inp str10 Individual region Indicator 
A 1 0
B 1 1 
C 2 1 
D 2 1
end

encode Individual, gen(id)
compress

bys region: egen countofindicator =total(Indicator)
by region: gen ratio=countofindicator/_N
li, noo
*************

" I have attempted
to use bys region: egen y=count if Indicator==1 but receive an invalid
syntax error.  "

The -egen- function "count()" demands an argument which it is supposed to
count. That is why you get an error. Furthermore, the "count" function
merely counts nonmissing arguments, no matter what their value may be. The
appropriate function to sum up values is -total()-, as seen in the example.
To get a total of all observations defined by the -by- groups, you can also
use "_N", see [U], 13.7.2.

(And do not even get started on -sum()-, see
http://www.stata.com/statalist/archive/2009-04/msg00699.html.)

" As an aside, is there a way to specify the variable y/x without specifying
y
and x?"

Finally, getting all of this in one fell swoop is difficult, or rather
impossible once an -egen- function is involved. Doing this step by step
helps when debugging...

HTH
Martin

-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von John Westbury
Gesendet: Mittwoch, 24. Februar 2010 03:45
An: [email protected]
Betreff: Re: st: RE: creating variables using 'by' for subsets of records

Thanks much for the feedback.  Here is an example of what the data looks
like that I am using:

  Individual region Indicator  A 1 0  B 1 1  C 2 1  D 2 1
I have encoded the regions and the ratio I am attempting to create would be
intuitively expressed as:
by region: count of indicator==1/count of individual.

I am trying to create a variable for the numerator by region (call it y) and
denominator by region (call it x) and then use gen ratio=y/x.
I can create a variable (x) for the denominator using; bys region: egen
x=count(Indicator).
I am having trouble creating a variable for the numerator.  I have attempted
to use bys region: egen y=count if Indicator==1 but receive an invalid
syntax error.  If someone has a suggestion on how to specify a variable for
a count of indicator==1 by region I would be very appreciative.

As an aside, is there a way to specify the variable y/x without specifying y
and x?

thanks

John

On Tue, Feb 23, 2010 at 2:29 PM, Martin Weiss <[email protected]> wrote:

>
> <>
>
> In the absence of example data, it is hard to give you advice. Look at
this
> calculation of regional unemployment rates:
>
>
> *******
> clear*
>
> //10 regions
> set obs 10
> gen byte region=_n
>
> //50 indiv per region
> expand 50
> bys region: gen byte id=_n
> gen byte unemployed=runiform()>.9
>
> bys region: gen number=_N
> by region: egen numofunempl=total(unemployed)
>
> gen unemprate=numofunempl/number
> *******
>
> HTH
> Martin
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of John Westbury
> Sent: Dienstag, 23. Februar 2010 20:55
> To: [email protected]
> Subject: st: creating variables using 'by' for subsets of records
>
> Hello,
>
> I have records for individuals by geographic region and wish to aggregate
> the records for individuals to records for geographic regions.  I believe
I
> should create variables for those regions using 'by'.  Ex: by Region gen x
> =
> argument for variable.  I am having difficulty with arguments for variable
> x.  For example I wish to create a region variable that expresses a ratio
> of
> count of indicator values for individuals in a region to a count of
> individuals in the region and am unsure how to code this.
>
> thanks
>
> John
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: RE: creating variables using 'by' for subsets of records
  - From: "Nick Cox" <[email protected]>

References:
- st: creating variables using 'by' for subsets of records
  - From: John Westbury <[email protected]>
- Re: st: RE: creating variables using 'by' for subsets of records
  - From: John Westbury <[email protected]>

Prev by Date: st: AW: Controling number of cores used by STATA MP
Next by Date: Re: AW: st: RE: AW: Re:
Previous by thread: Re: st: RE: creating variables using 'by' for subsets of records
Next by thread: RE: st: RE: creating variables using 'by' for subsets of records
Index(es):
- Date
- Thread