Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: AW: RE: Programming stata using egen functions


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: AW: RE: Programming stata using egen functions
Date   Tue, 28 Apr 2009 14:12:03 +0200

<> 

" is the -egen- function -sum()-, but it isn't. It is the Stata function -sum"

That is the weird thing: Just one letter ("e") differentiates the running sum from the total. I think that "egen, sum()" should be retired and only available under version control. It calls -egen, total()- internally anyway, so nobody would lose, but a lot of clarity gained...

*************
clear*
set obs 30
gen x= rnormal()

//total
egen y =sum(x)
//running sum
gen z=sum(x)

list, noobs
*************



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Nick Cox
Gesendet: Dienstag, 28. April 2009 13:42
An: statalist@hsphsun2.harvard.edu
Betreff: st: RE: Programming stata using egen functions

0. Martin is not quite right. After -syntax varname- the local macro varname is undefined. The single variable name you specified is contained in local macro varlist. 

1. The bigger deal is that Anne's program is problematic in yet other ways. As Kit implied, it's not clear what her precise problem is, but the occurrence of wired-in constants such as 8, 9, 20 indicates to me that Anne would be better off just writing a do-file with arguments for what she wants to do. 

2. A small inefficiency is that Anne is working with statement pairs like 

qui summarize ...
local ... = r(mean)

whereas -summarize, meanonly- is faster. 

3. A much bigger inefficiency is that Anne is working with code like 

tempvar ...
qui egen ... = max(...)
return scalar ... = ...

This is a very roundabout way to get a maximum: -egen, max()- is fired up to produce a variable, all of whose values are the maximum in question, and then its (first) value is read into a scalar. (It's even more roundabout than it may seem as -egen- and its function -max()- are both several lines of code to be interpreted.) 

Anne should note that -summarize, meanonly- [sic] is enough to get a maximum. After that the maximum is accessible in r(max). 

4. Manifestly, these details do not stop Anne's first program working. But there are now illegal statements in the second program which would be enough to cause a major problem.

Kit has alluded to the first. Anne should re-read the help for -egen- to see that whereas -egen, max()- is happy to feed on expressions, which could be scalar names, -egen, rowmax()- expects a varlist, and the scalar name she feeds it does not qualify: hence the error message. 

5. In addition, a line just below 

return scalar eq29=rowtotal(ep_`s')

won't work as -rowtotal()- is an -egen- function and as such can _only_ be used within -egen- commands. 

6. My last point is that Anne appears to be thinking that the -sum()- in 

qui gen ... =sum(P) ...

is the -egen- function -sum()-, but it isn't. It is the Stata function -sum()-. (Otherwise it wouldn't work; see point 5 above.) 
 
Nick 
n.j.cox@durham.ac.uk 

Martin Weiss
============

You allow the [if] qualifier, but if the user did indeed specify it, the -program- would not respect it.

Seems redundant to say -syntax [varlist]- and to -tokenize- it as you merely process one variable in there. So it could be -syntax varname-, to which you would refer as `varname' later on. Do you really want to make the varlist/varname optional? (What would your -program- do w/o a variable?)

Kit Baum
========

egen rowmax() and rowtotal() are meant to be used with more than one variable (rather than a scalar) as an argument.

     qui egen `exem_`s''=rowmax(m_`s'`sp')

if m_`s'`sp' is a scalar, its rowmax or rowtotal is itself--a constant value for each row (observation). What are you trying to do?

Anne Resende
============

I am having some problems with my ado.file whenever I try to include 
some egen functions like rowmax and rowtotal. I am currently using 
Stata 10.0
 
My ado program is:
 
 program mymean, rclass
  1.   syntax [varlist] [if]
  2.   tokenize "`varlist'"
  3.   forvalues s=9(1)20 {
  4.   forvalues sp=9(1)`s' {
  5.         qui summarize `1' if `sp'==u
  6.         local mean1_`s'`sp' =r(mean)
  7.         qui sum `1' if u==8
  8.         local mean2_`s'`sp' =r(mean)
  9.           scalar m_`s'`sp'=`mean1_`s'`sp'' + ((`mean1_`s'`sp'' - 
`mean2_`s'`sp'')/(`sp'-8))*(9-`sp')
 10.         tempvar exem_`s' P1_`s' P_`s' ep_`s' ep2_`s'
 11.         qui egen `exem_`s''=max(m_`s'`sp')
 12.           return scalar exem2_`s'=`exem_`s''*1
 13.           qui gen `P1_`s''=sum(P) if `s'==u & id==1
 14.           qui egen `P_`s'' =max(`P1_`s'') 
 15.           scalar ep_`s'=exem2_`s'*`P_`s'' 
 16.         qui gen `ep_`s''=exem2_`s'*`P_`s''
 17.         return scalar eq29=sum(ep_`s')
 18. }
 19. }
 20. end
This program is running well. But in lines 11 and 17, I in fact need to 
use the commands rowmax rather than max and rowtotal rather sum. So 
when I use max and sum the program runs well but when I substitute this 
commands whith rowmax and rowtotal stata gives me the following error 
message after I typed mymean loghw: 
 
 
 program mymean, rclass
  1.   syntax [varlist] [if]
  2.   tokenize "`varlist'"
  3.   forvalues s=9(1)20 {
  4.   forvalues sp=9(1)`s' {
  5.         qui summarize `1' if `sp'==u
  6.         local mean1_`s'`sp' =r(mean)
  7.         qui sum `1' if u==8
  8.         local mean2_`s'`sp' =r(mean)
  9.           scalar m_`s'`sp'=`mean1_`s'`sp'' + ((`mean1_`s'`sp'' - 
`mean2_`s'`sp'')/(`sp'-8))*(9-`sp')
 10.         tempvar exem_`s' P1_`s' P_`s' ep_`s' ep2_`s'
 11.         qui egen `exem_`s''=rowmax(m_`s'`sp')
 12.           return scalar exem2_`s'=`exem_`s''*1
 13.           qui gen `P1_`s''=sum(P) if `s'==u & id==1
 14.           qui egen `P_`s'' =max(`P1_`s'') 
 15.           scalar ep_`s'=exem2_`s'*`P_`s'' 
 16.         qui gen `ep_`s''=exem2_`s'*`P_`s''
 17.         return scalar eq29=rowtotal(ep_`s')
 18. }
 19. }
 20. end
. 
end of do-file
. mymean loghw
variable m_99 not found
r(111);

So I would like to know why the program recognizes the max and sum egen 
functions but do not recognize (or do not find my variable) when I use 
the rowmax and rowtotal egen functions. Is there any programming 
differences between these two kind of egen functons?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index