Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: RE: RE: How to adjust the content of a local macro?


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   AW: st: RE: RE: How to adjust the content of a local macro?
Date   Thu, 10 Dec 2009 11:29:56 +0100

<> 



Let me guess: You forgot a hard return after the call to -egen-, so the line
continues and Stata thinks that the following -drop- is part of the -egen-
call. At least that is what the -trace- suggests...



HTH
Martin

-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Joachim
Landström
Gesendet: Donnerstag, 10. Dezember 2009 11:27
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: RE: RE: How to adjust the content of a local macro?

I have been trying to implement Nick's proposition of using -egen- to  
remove unwanted ids. When I do it interatively in Stata it works  
nicely. But when I run exacly the same code from within the program I  
end up with error messages. See below. What may be the problem?

. use us_data_ret, clear

. desc

Contains data from us_data_ret.dta
   obs:    17,516,468
  vars:             5                          10 Nov 2009 07:02
  size:   525,494,040 (60.5% of memory free)
-------------------------------------------
               storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------
id              int    %8.0g                  ID
dscd            str6   %9s                    DSCD
date            float  %td
year            int    %8.0g
totalReturn     double %10.0g
-----------------------------------------------------
Sorted by:  id  date

. egen nvalid = count(totalReturn), by(id)

. drop if nvalid < 156
(2257110 observations deleted)

When run trough a program, Stata issues the following error:
option drop not allowed
r(198);

-set trace on- refuses work when implemented directly after -egen- but  
works when put before -egen- and then it shows (just the final part of  
the output):


     - capture noisily `vv' _g`fcn' `type' `dummy' = (`args') `if'  
`in' `cma' `byopt' `options'
     = capture noisily  _gcount float __000009 = (totalReturn)   ,   
by(id) drop if nvalid < 156
        
----------------------------------------------------------------------------
------------------ begin _gcount  
---
       - version 6, missing
       - syntax newvarname =/exp [if] [in] [, BY(varlist)]
option drop not allowed
        
----------------------------------------------------------------------------
-------------------- end _gcount  
---
     - global EGEN_SVarname
     - global EGEN_Varname
     - if _rc { exit _rc }
      
----------------------------------------------------------------------------
------------------------- end egen  
---
r(198);

Again: What is going on? Why can I issue the code lines separately but  
not as part of a program? Has it something to do with the size of the  
database (-egen- takes a few second to finish.

By the way. I have now implemnted -meanonly- and it improves the  
performance of my program with approx 20 percent when run on a small  
sample. That is great.

/Joachim






Quoting Nick Cox <n.j.cox@durham.ac.uk>:

> Having looked again at the code, the problem appears to be   
> identifying panels for which the number of non-missing values of   
> -TotalReturn- is at least a predefined value stored in a local macro  
>  -requiredEstimationPeriod-.
>
> That is
>
> egen nvalid = count(TotalReturn), by(id)
> drop if nvalid < `requiredEstimationPeriod'
>
> Nick
> n.j.cox@durham.ac.uk
>
> Nick Cox
>
> Martin answered the question here, but various secondary points   
> arise from looking at the code. Most are on style and most are of   
> some wider interest.
>
> 1. The loop consists of repeated -drop-ping of observations not   
> desired, working with the remaining subset and then a -restore- of   
> the original. It is difficult to say in general what is most   
> efficient and what most elegant but for a situation like that below   
> I'd normally just add an extra condition excluding the observations   
> not wanted, rather than repeatedly doing major surgery on the   
> dataset. However, others could equally point out that applying -if-   
> on a very large dataset can be time-consuming.
>
> 2. If only the minimum and maximum are needed from a -summarize- it   
> is best just to use a -meanonly- option. (The name -meanonly- is   
> misleading, as I've had occasion to remark before.)
>
> 3. Code like
>
> 	local `minDate' = r(min)
> 	<stuff> if <stuff> date >= ``minDate''
>
> looks legal but odd. You are probably using more levels of macros   
> than you need. It's hard to tell because the code isn't completely   
> self-contained (that's not a criticism; it wasn't necessary for your  
>  question).
>
> 4. Code in which you loop over the contents of a local macro and   
> change that macro within the loop can be tricky. Watch out!
>
> 5. The -if- condition in
>
> 	summarize totalReturn if totalReturn != .
>
> is unnecessary as -summarize- always ignores missings.
>
> 6. To get minimum and maximum dates in a panel, no looping is necessary as
>
> egen mindate = min(date), by(id)
> egen maxdate = max(date), by(id)
>
> will do it. Similarly it looks as if your main problem does not need  
>  any looping either, as it should yield to -egen- operations. Look  
> at  -egen, count()- in particular.
>
> 7. More generally, it is not always positive to know too many other   
> languages if they lead you to seek a Stata equivalent of other code   
> when there's a Stataish way to do it without any real programming.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Joachim Landström
>
> I have what I hope to be a minor problem that I nevertheless fail to find
a
> solution to. Suppose that I have a local macro panelVar that contains
panel
> ids. Based on a selection criterion I wish to remove some panel ids from
> panelVar. How do I do that? I use Stata/MP 10.1 in Windows XP 32-bit.
>
> More specifically see example below. Suppose the panel id is called id and
> the time series variable is date. Per id & date I have the actual content
in
> the form of totalReturn (tDelta is 7):
>
> **** Begin Example ****
> local estimationPeriod = 3
>
> local requiredEstimationPeriod = `estimationPeriod' * floor( 365 /
> ``tDelta'' )
>
> levelsof id, local(panelVar)
>
> preserve
>
> quietly foreach i of local panelVar ///
> 	{
> 		restore, preserve
> 		drop if id != `i'
>
> 		summarize date if totalReturn != .
> 		local `minDate' = r(min)
> 		local `maxDate' = r(max)
>
> 		summarize totalReturn if totalReturn != . ///
> 					& date >= ``minDate'' & date <=
> ``maxDate''
>
> 		if  `r(N)' < `requiredEstimationPeriod' ///
> 			{
> 			***** Here I wish to update the local macro panelVar
> such `i' is removed *********
> 			}
> 		else ///
> 			{
> 			}
> 	}
> **** End Example ****
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Joachim Landström


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index