Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: RE: RE: How to adjust the content of a local macro?


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   AW: st: RE: RE: How to adjust the content of a local macro?
Date   Thu, 10 Dec 2009 11:58:05 +0100

<> 



" By the way. I have now implemnted -meanonly- and it improves the  
performance of my program with approx 20 percent when run on a small  
sample. That is great."




As Nick says in http://www.stata-journal.com/article.html?article=st0135:


"The difference between summarize, meanonly and summarize with no options is
that the latter also calculates the variance and its square root, the
standard deviation.
The reason for the meanonly option is that this last calculation can be
fairly time consuming
in large datasets."



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Joachim
Landström
Gesendet: Donnerstag, 10. Dezember 2009 11:27
An: statalist@hsphsun2.harvard.edu
Betreff: Re: st: RE: RE: How to adjust the content of a local macro?

I have been trying to implement Nick's proposition of using -egen- to  
remove unwanted ids. When I do it interatively in Stata it works  
nicely. But when I run exacly the same code from within the program I  
end up with error messages. See below. What may be the problem?

. use us_data_ret, clear

. desc

Contains data from us_data_ret.dta
   obs:    17,516,468
  vars:             5                          10 Nov 2009 07:02
  size:   525,494,040 (60.5% of memory free)
-------------------------------------------
               storage  display     value
variable name   type   format      label      variable label
------------------------------------------------------------
id              int    %8.0g                  ID
dscd            str6   %9s                    DSCD
date            float  %td
year            int    %8.0g
totalReturn     double %10.0g
-----------------------------------------------------
Sorted by:  id  date

. egen nvalid = count(totalReturn), by(id)

. drop if nvalid < 156
(2257110 observations deleted)

When run trough a program, Stata issues the following error:
option drop not allowed
r(198);

-set trace on- refuses work when implemented directly after -egen- but  
works when put before -egen- and then it shows (just the final part of  
the output):


     - capture noisily `vv' _g`fcn' `type' `dummy' = (`args') `if'  
`in' `cma' `byopt' `options'
     = capture noisily  _gcount float __000009 = (totalReturn)   ,   
by(id) drop if nvalid < 156
        
----------------------------------------------------------------------------
------------------ begin _gcount  
---
       - version 6, missing
       - syntax newvarname =/exp [if] [in] [, BY(varlist)]
option drop not allowed
        
----------------------------------------------------------------------------
-------------------- end _gcount  
---
     - global EGEN_SVarname
     - global EGEN_Varname
     - if _rc { exit _rc }
      
----------------------------------------------------------------------------
------------------------- end egen  
---
r(198);

Again: What is going on? Why can I issue the code lines separately but  
not as part of a program? Has it something to do with the size of the  
database (-egen- takes a few second to finish.

By the way. I have now implemnted -meanonly- and it improves the  
performance of my program with approx 20 percent when run on a small  
sample. That is great.

/Joachim






Quoting Nick Cox <n.j.cox@durham.ac.uk>:

> Having looked again at the code, the problem appears to be   
> identifying panels for which the number of non-missing values of   
> -TotalReturn- is at least a predefined value stored in a local macro  
>  -requiredEstimationPeriod-.
>
> That is
>
> egen nvalid = count(TotalReturn), by(id)
> drop if nvalid < `requiredEstimationPeriod'
>
> Nick
> n.j.cox@durham.ac.uk
>
> Nick Cox
>
> Martin answered the question here, but various secondary points   
> arise from looking at the code. Most are on style and most are of   
> some wider interest.
>
> 1. The loop consists of repeated -drop-ping of observations not   
> desired, working with the remaining subset and then a -restore- of   
> the original. It is difficult to say in general what is most   
> efficient and what most elegant but for a situation like that below   
> I'd normally just add an extra condition excluding the observations   
> not wanted, rather than repeatedly doing major surgery on the   
> dataset. However, others could equally point out that applying -if-   
> on a very large dataset can be time-consuming.
>
> 2. If only the minimum and maximum are needed from a -summarize- it   
> is best just to use a -meanonly- option. (The name -meanonly- is   
> misleading, as I've had occasion to remark before.)
>
> 3. Code like
>
> 	local `minDate' = r(min)
> 	<stuff> if <stuff> date >= ``minDate''
>
> looks legal but odd. You are probably using more levels of macros   
> than you need. It's hard to tell because the code isn't completely   
> self-contained (that's not a criticism; it wasn't necessary for your  
>  question).
>
> 4. Code in which you loop over the contents of a local macro and   
> change that macro within the loop can be tricky. Watch out!
>
> 5. The -if- condition in
>
> 	summarize totalReturn if totalReturn != .
>
> is unnecessary as -summarize- always ignores missings.
>
> 6. To get minimum and maximum dates in a panel, no looping is necessary as
>
> egen mindate = min(date), by(id)
> egen maxdate = max(date), by(id)
>
> will do it. Similarly it looks as if your main problem does not need  
>  any looping either, as it should yield to -egen- operations. Look  
> at  -egen, count()- in particular.
>
> 7. More generally, it is not always positive to know too many other   
> languages if they lead you to seek a Stata equivalent of other code   
> when there's a Stataish way to do it without any real programming.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Joachim Landström
>
> I have what I hope to be a minor problem that I nevertheless fail to find
a
> solution to. Suppose that I have a local macro panelVar that contains
panel
> ids. Based on a selection criterion I wish to remove some panel ids from
> panelVar. How do I do that? I use Stata/MP 10.1 in Windows XP 32-bit.
>
> More specifically see example below. Suppose the panel id is called id and
> the time series variable is date. Per id & date I have the actual content
in
> the form of totalReturn (tDelta is 7):
>
> **** Begin Example ****
> local estimationPeriod = 3
>
> local requiredEstimationPeriod = `estimationPeriod' * floor( 365 /
> ``tDelta'' )
>
> levelsof id, local(panelVar)
>
> preserve
>
> quietly foreach i of local panelVar ///
> 	{
> 		restore, preserve
> 		drop if id != `i'
>
> 		summarize date if totalReturn != .
> 		local `minDate' = r(min)
> 		local `maxDate' = r(max)
>
> 		summarize totalReturn if totalReturn != . ///
> 					& date >= ``minDate'' & date <=
> ``maxDate''
>
> 		if  `r(N)' < `requiredEstimationPeriod' ///
> 			{
> 			***** Here I wish to update the local macro panelVar
> such `i' is removed *********
> 			}
> 		else ///
> 			{
> 			}
> 	}
> **** End Example ****
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
Joachim Landström


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index