Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Overriding a loop if 0 observations using tabstat


From   Robert Picard <picard@netbox.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Overriding a loop if 0 observations using tabstat
Date   Thu, 29 Apr 2010 10:40:47 -0400

As I woke up this morning I had another though related to this: if
Stata had a -compact- command to remove the extra space, then
programmers that know what they are trading-off (increased cache hits
and cache lines that contain 100% data against the extra overhead to
make space for new variables) could get an easy way to improve the
performance of their code when appropriate. I'm not sure I want to
chance changing memory allocation mid-program but I can see how I
could use a -compact- command.

Just a suggestion; please disregard if difficult to implement.

Robert

On Wed, Apr 28, 2010 at 5:27 PM, Vince Wiggins, StataCorp
<vwiggins@stata.com> wrote:
>           Stata     y
>          +------------+
>          |1235678|1234|
>          |1235678|1234|
>          |1235678|1234|
>          |  ...  | ...|
>
> Stata datasets usually are not stored this densely.  Normally, there would be
> free space at the end of each record where more variables can be added.
>
>           Stata     y    free space
>          +-----------------------------
>          |1235678|1234|  ...
>          |1235678|1234|
>          |1235678|1234|
>          |  ...  | ...|  ...
>
> Moreover, you are likely to have even more free space at the end of each
> record if you have allocated more memory to Stata.  This lets Stata add and
> drop variables quickly.
>
> So, with 10 MB allocated, RRK's data might look like
>
>           Stata     y    free space
>          +---------------------+
>          |1235678|1234|12345678|
>          |1235678|1234|12345678|                   (1)
>          |1235678|1234|12345678|
>          |  ...  | ...|   ...  |
>
> And, with 1000 MB allocated, it might look like
>
>           Stata     y    free space
>          +-------------------------------------
>          |1235678|1234|123456789...  ... ...
>          |1235678|1234|123456789...  ... ...       (2)
>          |1235678|1234|123456789...  ... ...
>          |  ...  | ...|   ...
>
> With the dataset organized as in (1), each record is 20 characters wide,
> including free space, and so there is enough room to store all of the data,
> including free space in the cache.  With the dataset organized as in (2), that
> might not be true.  Since we have 100,000 records and 8 MB of cache, if the
> records are wider than 8*2^20/100000 = 83.9 characters, then the entire data
> area will not fit into cache.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index