Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Overriding a loop if 0 observations using tabstat

From	Robert Picard <[email protected]>
To	[email protected]
Subject	Re: st: Overriding a loop if 0 observations using tabstat
Date	Tue, 27 Apr 2010 16:07:37 -0400

I don't understand. Under both scenario (-set memory 1g- or -set
memory 10m-), the dataset size and everything else is the same. On my
computer with 12GB of RAM, a 1g allocation should not make a
difference and none of it should be paged out to virtual memory. In
fact, Stata does not even allocated to itself 1GB of real or virtual
memory (when I look at the Activity Monitor) unless I actually create
or load a dataset which requires 1GB of RAM.

The reason why I ask is that the lesson appears to be that when
running Stata, you should always aim for the smallest memory
allocation possible for maximum efficiency at the price of finding
out, hours later when you encounter an insufficient memory error that
you should have used a larger -set memory-.

Robert

On Tue, Apr 27, 2010 at 3:42 PM, Martin Weiss <[email protected]> wrote:
>
> <>
>
> The additional 990m for the 1g allocation decrease the amount available for
> computations, so this is what I would expect to happen.
>
>
> HTH
> Martin
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Robert Picard
> Sent: Dienstag, 27. April 2010 21:39
> To: [email protected]
> Subject: Re: st: Overriding a loop if 0 observations using tabstat
>
> Do you guys see a difference if you try under a different memory
> allocation? I'm running Stata/MP 11 (4 cores) on a Mac Pro 2.93GHz
> Quad-Core with 12GB of RAM and get:
>
> with 1g allocation: t=17.49; t=64.09; t=71.18
> with 10m allocation: t=10.93; t=43.35; t=47.68
>
> Just curious,
>
> Robert
>
> On Tue, Apr 27, 2010 at 2:59 PM, Jeph Herrin <[email protected]> wrote:
>> This is 64bit MP 2 on Windows 7 with 8G ram.
>> The processor is an AMD Phenom II with 3.20GHz clock speed.
>>
>> cheers,
>> J
>>
>>
>> Martin Weiss wrote:
>>>
>>> <>
>>>
>>> Jeph, out of curiosity, what kind of equipment is it that throws up these
>>> numbers? Mine is 64 bit MP 4 on Windows 7 with 4G Ram.
>>>
>>>
>>> HTH
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Jeph Herrin
>>> Sent: Dienstag, 27. April 2010 20:27
>>> To: [email protected]
>>> Subject: Re: st: Overriding a loop if 0 observations using tabstat
>>>
>>> t=48.90; t=60.45; t=72.30. :>
>>>
>>>
>>> Martin Weiss wrote:
>>>>
>>>> <>
>>>>
>>>> t=100.28; t=207.58; t=241.55. :-)
>>>>
>>>>
>>>> HTH
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of Nick Cox
>>>> Sent: Dienstag, 27. April 2010 19:08
>>>> To: [email protected]
>>>> Subject: RE: st: Overriding a loop if 0 observations using tabstat
>>>>
>>>> Good question. I decided to do some timings to support -- or rebut -- my
>>>> feeling that -count- which just counts should be faster than -summarize,
>>>> meanonly- which does other stuff too and in turn than -summarize- which
>>>
>>> does
>>>>
>>>> other stuff too. But although that's the order the timings are closer
>>>> than
>>>
>>> I
>>>>
>>>> guessed. Still, doing anything the quickest way does no harm and may
> give
>>>> valuable speed-up for large problems.
>>>> Here is one test script. Compare your experiences:
>>>> clear
>>>> set obs 100000
>>>> set seed 2803
>>>> gen y = runiform()
>>>> set rmsg on
>>>>
>>>> qui forval i = 1/10000 {
>>>>        count if y > 0.5
>>>> }
>>>>
>>>> qui forval i = 1/10000 {
>>>>        su y if y > 0.5, meanonly
>>>> }
>>>>
>>>> qui forval i = 1/10000 {
>>>>        su y if y > 0.5
>>>> }
>>>>
>>>> My timings were t=187.49; 254.49; 313.38, which no doubt shows up the
>>>> Mesolithic age of my machine.
>>>> Nick [email protected]
>>>> Martin Weiss
>>>>
>>>> " As a small detail of efficiency, I would always recommend -count-
>>>> rather
>>>> than -summarize- for the purpose here."
>>>>
>>>> My earlier code did use -count-... What makes this thing more efficient,
>>>> though? Both are built-in, so they probably enjoy a big advantage over
>>>> everybody else anyway. So I guess the reason for your preference is the
>>>
>>> fact
>>>>
>>>> that -count- calculates fewer results than -su, mean-?
>>>>
>>>> Nick Cox
>>>>
>>>> A secondary theme here is that this kind of code gets very difficult to
>>>> read, which makes it difficult to maintain and debug.
>>>> I note that the condition
>>>> intab1 == 1 & admit_ic == 1 & btwg < .
>>>> is common to all the -summarize- and -tabstat- commands. That being so,
>>>
>>> you
>>>>
>>>> could get that out of the way like this
>>>> preserve keep if intab1 == 1 & admit_ic == 1 & btwg < .
>>>> <stuff> restore
>>>> Your -tabstat- options that are constant can be put in a little bag:
>>>> local opts stat(n mean median p25 p75 min max) col(stat) f(%9.0g) notot
>>>> nosep
>>>>
>>>> Now <stuff> can be rewritten
>>>> forv i = 0/5 {
>>>>        foreach y in male singlet {
>>>>                forv s = 0/1 {
>>>>                        di "myga==`i' & `y'==`s'"
>>>>                        qui su bwtg if myga==`i' & `y'
>>>>                        if r(N) != 0 {
>>>>                                tabstat bwtg if myga==`i', `opts' by(`y')
>>>>                       }
>>>>                }
>>>>        }
>>>> }
>>>>
>>>> Now it is easier to see what is going on. I added some cosmetic changes
>>>
>>> too,
>>>>
>>>> which this horrible mailer may well reverse.
>>>> One puzzle: Did you mean to add the condition "& `y'" to the
> -summarize-?
>>>
>>> It
>>>>
>>>> means the same as
>>>> & `y' != 0
>>>> -- which may or may not be what you want.
>>>> As a small detail of efficiency, I would always recommend -count- rather
>>>> than -summarize- for the purpose here.
>>>> Nick [email protected]
>>>> sara khan
>>>>
>>>> Many thanks Maarten for your advice. I managed to resolve it with the
>>>> following code:
>>>>
>>>> forv i=0/5 {
>>>> foreach y in male singlet{
>>>> forv s=0/1{
>>>> di "myga==`i' & `y'==`s'"
>>>> qui su bwtg if myga==`i' & intab1==1 & admit_ic==1 & bwtg<. & `y'
>>>>        if r(N)!=0{
>>>> tabstat bwtg if myga==`i' & intab1==1 & admit_ic==1 & bwtg<., stat(n
>>>> mean median p25 p75 min max ) by(`y') col(stat) f(%9.0g) notot nosep
>>>>
>>>> }
>>>> }
>>>> }
>>>> }
>>>>
>>>>
>>>> On Tue, Apr 27, 2010 at 12:56 PM, Maarten buis <[email protected]>
>>>> wrote:
>>>>>
>>>>> --- On Tue, 27/4/10, sara khan wrote:
>>>>>>
>>>>>> I just tried this but the output only shows the display
>>>>>> results and nothing from tabstat.
>>>>>
>>>>> <snip>
>>>>>
>>>>> -capture- works for me:
>>>>>
>>>>> *----------------- begin example ---------------------
>>>>> sysuse auto, clear
>>>>> forvalues i = 0/5 {
>>>>>       capture noisily tabstat mpg if rep78== `i', ///
>>>>>               s(n mean) by(foreign)
>>>>> }
>>>>> *-------------------- end example -------------------
>>>>>
>>>>> In order to debug your loop I would build it step by step:
>>>>> step 1: no looping, no locals, no -if- just a single -tatstat- command
>>>>> step 2: add -capture noisily-
>>>>> step 3: add some -if- conditions
>>>>> step 4: build a single loop (e.g. over i but not over y)
>>>>> etc. etc.
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Overriding a loop if 0 observations using tabstat
  - From: Jeph Herrin <[email protected]>

References:
- Re: st: Overriding a loop if 0 observations using tabstat
  - From: sara khan <[email protected]>
- Re: st: Overriding a loop if 0 observations using tabstat
  - From: Maarten buis <[email protected]>
- Re: st: Overriding a loop if 0 observations using tabstat
  - From: sara khan <[email protected]>
- RE: st: Overriding a loop if 0 observations using tabstat
  - From: "Nick Cox" <[email protected]>
- RE: st: Overriding a loop if 0 observations using tabstat
  - From: "Nick Cox" <[email protected]>
- Re: st: Overriding a loop if 0 observations using tabstat
  - From: Jeph Herrin <[email protected]>
- Re: st: Overriding a loop if 0 observations using tabstat
  - From: Jeph Herrin <[email protected]>
- Re: st: Overriding a loop if 0 observations using tabstat
  - From: Robert Picard <[email protected]>

Prev by Date: RE: st: Overriding a loop if 0 observations using tabstat
Next by Date: st: Placing a string variable in a local/global
Previous by thread: RE: st: Overriding a loop if 0 observations using tabstat
Next by thread: Re: st: Overriding a loop if 0 observations using tabstat
Index(es):
- Date
- Thread