Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: Re: return list and by


From   <cthompson@dfpm.utah.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Re: return list and by
Date   Tue, 13 Apr 2004 10:14:26 -0600

Many Thanks to both Nick Cox & Michael Blasnik for their 
suggestions!


On 13 Apr 2004 at 10:43, Nick Cox wrote:

> To expand on a few points: 
> 
> 1. The -egen- route is naturally fine whenever it 
> offers the statistic you want to keep. Note that 
> when you go 
> 
> by person : egen mean = mean(whatever)
> 
> That mean will be repeated for every observation 
> for each person. For many purposes you will want 
> to use each mean just once. Tagging is one way to
> do that 
> 
> egen tag = tag(person) 
> 
> tags just one observation for each person. Then 
> you can follow up with 
> 
> ... if tag 
> 
> If you look inside the -mean()- function you 
> will see that it does not use -summarize-. 
> 
> 2. -statsby- is what I call a reduction command. 
> You will get one observation for each person, in your 
> problem, i.e. you will lose the existing dataset. 
> Roger Newson's set of commands is another 
> example of a reduction approach. 
> 
> 3. Doing it yourself with a loop is not too 
> difficult, but you must loop over groups at 
> a minimum. One FAQ which may be of help is 
> http://www.stata.com/support/faqs/data/foreach.html
> 
> Persons will have integer or string identifiers, 
> so using -levels- is often effective. Taking 
> your program 
> 
> > program mean_get
> > syntax [varlist]
> > foreach i of local varlist {
> > generate overall2 = 0
> > by name, sort:  summarize `i', meanonly
> > local 1 = r(mean)
> > by name, sort:  replace overall2 = `1'
> > }
> > end
> 
> let's forget that -egen- exists and write a program 
> that puts means of a varlist into variables -mean_*-
> for a division by some required variable. 
> 
> program mean_get
>  * NJC after Clint Thompson 13 April 2004 
>  version 8
>  syntax varlist(numeric), by(varname) 
> 
>  // check new names will be OK 
>  foreach v of local varlist { 
>   confirm new var mean_`v' 
>  } 
> 
>  // this will fail if `by' is not categorical 
>  qui levels `by', local(B)
>  capture confirm numeric var `by'
>  local isstr = _rc != 0 
> 
>  qui foreach v of local varlist { 
>   gen mean_`v' = . 
>   if `isstr' { 
>    foreach b of local B { 
>     su `v' if `by' == `"`b'"', meanonly 
>     replace mean_`v' = r(mean) if `by' == `"`b'"' 
>    }
>   } 
>   else { 
>    foreach b of local B { 
>     su `v' if `by' == `b', meanonly 
>     replace mean_`v' = r(mean) if `by' == `b'
>    }
>   } 
>  }	
> end 
> 
> I added a bit more checking. There is still much that
> could be added e.g. support for -if- and -in-. 	
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Michael Blasnik
> 
> > most directly:
> > 
> > by name: egen overall2=mean(`i')
> > 
> > will put the mean of var `i', calculated for each value of name, in
> > overall2.  Of course, this value will still be overwritten by the next
> > var in the varlist if your loop is left as shown.
> > 
> > More specifically, when you use constructs such as -by x: 
> > summ y- , you
> > can't get at the individual results -- it's quite frustrating 
> > when you first
> > discover this.  Instead you either have to write your own loop across
> > groups, or else you may be able to use an -egen- function or -statsby-
> > (or perhaps -parmby-).
> 
> cthompson@dfpm.utah.edu
> 
> > > Is there a way to save the mean in r() for each value of a
> > > variable?  More specifically, I want the mean for each
> > > different person contained within the variable 'name'.  My
> > > programming experience in Stata is quite limited so I'd
> > > appreciate any advice; the shrapnel from my attempt at coding
> > > is pasted below:
> > >
> > > program mean_get
> > > syntax [varlist]
> > > foreach i of local varlist {
> > > generate overall2 = 0
> > > by name, sort:  summarize `i', meanonly
> > > local 1 = r(mean)
> > > by name, sort:  replace overall2 = `1'
> > > }
> > > end
> > >
> > > When I run this, it returns the mean from the last value in
> > > 'name' and uses it to replace all values of 'overall2'.  Any
> > > suggestions?
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index