Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: RE: creating variables using 'by' for subsets of records


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   AW: st: RE: creating variables using 'by' for subsets of records
Date   Wed, 24 Feb 2010 14:13:54 +0100

<> 

Several issues pose themselves here. A solution could be something like
this:

*************
clear*

inp str10 Individual region Indicator 
A 1 0
B 1 1 
C 2 1 
D 2 1
end

encode Individual, gen(id)
compress

bys region: egen countofindicator =total(Indicator)
by region: gen ratio=countofindicator/_N
li, noo
*************


" I have attempted
to use bys region: egen y=count if Indicator==1 but receive an invalid
syntax error.  "


The -egen- function "count()" demands an argument which it is supposed to
count. That is why you get an error. Furthermore, the "count" function
merely counts nonmissing arguments, no matter what their value may be. The
appropriate function to sum up values is -total()-, as seen in the example.
To get a total of all observations defined by the -by- groups, you can also
use "_N", see [U], 13.7.2.

(And do not even get started on -sum()-, see
http://www.stata.com/statalist/archive/2009-04/msg00699.html.)


" As an aside, is there a way to specify the variable y/x without specifying
y
and x?"


Finally, getting all of this in one fell swoop is difficult, or rather
impossible once an -egen- function is involved. Doing this step by step
helps when debugging...


HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von John Westbury
Gesendet: Mittwoch, 24. Februar 2010 03:45
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: RE: creating variables using 'by' for subsets of records

Thanks much for the feedback.  Here is an example of what the data looks
like that I am using:

  Individual region Indicator  A 1 0  B 1 1  C 2 1  D 2 1
I have encoded the regions and the ratio I am attempting to create would be
intuitively expressed as:
by region: count of indicator==1/count of individual.

I am trying to create a variable for the numerator by region (call it y) and
denominator by region (call it x) and then use gen ratio=y/x.
I can create a variable (x) for the denominator using; bys region: egen
x=count(Indicator).
I am having trouble creating a variable for the numerator.  I have attempted
to use bys region: egen y=count if Indicator==1 but receive an invalid
syntax error.  If someone has a suggestion on how to specify a variable for
a count of indicator==1 by region I would be very appreciative.

As an aside, is there a way to specify the variable y/x without specifying y
and x?

thanks

John


On Tue, Feb 23, 2010 at 2:29 PM, Martin Weiss <martin.weiss1@gmx.de> wrote:

>
> <>
>
> In the absence of example data, it is hard to give you advice. Look at
this
> calculation of regional unemployment rates:
>
>
> *******
> clear*
>
> //10 regions
> set obs 10
> gen byte region=_n
>
> //50 indiv per region
> expand 50
> bys region: gen byte id=_n
> gen byte unemployed=runiform()>.9
>
> bys region: gen number=_N
> by region: egen numofunempl=total(unemployed)
>
> gen unemprate=numofunempl/number
> *******
>
> HTH
> Martin
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of John Westbury
> Sent: Dienstag, 23. Februar 2010 20:55
> To: statalist@hsphsun2.harvard.edu
> Subject: st: creating variables using 'by' for subsets of records
>
> Hello,
>
> I have records for individuals by geographic region and wish to aggregate
> the records for individuals to records for geographic regions.  I believe
I
> should create variables for those regions using 'by'.  Ex: by Region gen x
> =
> argument for variable.  I am having difficulty with arguments for variable
> x.  For example I wish to create a region variable that expresses a ratio
> of
> count of indicator values for individuals in a region to a count of
> individuals in the region and am unsure how to code this.
>
> thanks
>
> John
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index