Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Martin Weiss" <martin.weiss1@gmx.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
AW: st: RE: creating variables using 'by' for subsets of records |

Date |
Wed, 24 Feb 2010 14:13:54 +0100 |

<> Several issues pose themselves here. A solution could be something like this: ************* clear* inp str10 Individual region Indicator A 1 0 B 1 1 C 2 1 D 2 1 end encode Individual, gen(id) compress bys region: egen countofindicator =total(Indicator) by region: gen ratio=countofindicator/_N li, noo ************* " I have attempted to use bys region: egen y=count if Indicator==1 but receive an invalid syntax error. " The -egen- function "count()" demands an argument which it is supposed to count. That is why you get an error. Furthermore, the "count" function merely counts nonmissing arguments, no matter what their value may be. The appropriate function to sum up values is -total()-, as seen in the example. To get a total of all observations defined by the -by- groups, you can also use "_N", see [U], 13.7.2. (And do not even get started on -sum()-, see http://www.stata.com/statalist/archive/2009-04/msg00699.html.) " As an aside, is there a way to specify the variable y/x without specifying y and x?" Finally, getting all of this in one fell swoop is difficult, or rather impossible once an -egen- function is involved. Doing this step by step helps when debugging... HTH Martin -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von John Westbury Gesendet: Mittwoch, 24. Februar 2010 03:45 An: statalist@hsphsun2.harvard.edu Betreff: Re: st: RE: creating variables using 'by' for subsets of records Thanks much for the feedback. Here is an example of what the data looks like that I am using: Individual region Indicator A 1 0 B 1 1 C 2 1 D 2 1 I have encoded the regions and the ratio I am attempting to create would be intuitively expressed as: by region: count of indicator==1/count of individual. I am trying to create a variable for the numerator by region (call it y) and denominator by region (call it x) and then use gen ratio=y/x. I can create a variable (x) for the denominator using; bys region: egen x=count(Indicator). I am having trouble creating a variable for the numerator. I have attempted to use bys region: egen y=count if Indicator==1 but receive an invalid syntax error. If someone has a suggestion on how to specify a variable for a count of indicator==1 by region I would be very appreciative. As an aside, is there a way to specify the variable y/x without specifying y and x? thanks John On Tue, Feb 23, 2010 at 2:29 PM, Martin Weiss <martin.weiss1@gmx.de> wrote: > > <> > > In the absence of example data, it is hard to give you advice. Look at this > calculation of regional unemployment rates: > > > ******* > clear* > > //10 regions > set obs 10 > gen byte region=_n > > //50 indiv per region > expand 50 > bys region: gen byte id=_n > gen byte unemployed=runiform()>.9 > > bys region: gen number=_N > by region: egen numofunempl=total(unemployed) > > gen unemprate=numofunempl/number > ******* > > HTH > Martin > > > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of John Westbury > Sent: Dienstag, 23. Februar 2010 20:55 > To: statalist@hsphsun2.harvard.edu > Subject: st: creating variables using 'by' for subsets of records > > Hello, > > I have records for individuals by geographic region and wish to aggregate > the records for individuals to records for geographic regions. I believe I > should create variables for those regions using 'by'. Ex: by Region gen x > = > argument for variable. I am having difficulty with arguments for variable > x. For example I wish to create a region variable that expresses a ratio > of > count of indicator values for individuals in a region to a count of > individuals in the region and am unsure how to code this. > > thanks > > John > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: RE: creating variables using 'by' for subsets of records***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**References**:**st: creating variables using 'by' for subsets of records***From:*John Westbury <jrwestbury@gmail.com>

**Re: st: RE: creating variables using 'by' for subsets of records***From:*John Westbury <jrwestbury@gmail.com>

- Prev by Date:
**st: AW: Controling number of cores used by STATA MP** - Next by Date:
**Re: AW: st: RE: AW: Re:** - Previous by thread:
**Re: st: RE: creating variables using 'by' for subsets of records** - Next by thread:
**RE: st: RE: creating variables using 'by' for subsets of records** - Index(es):