Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: A bug in egen and gen?


From   "Liao, Junlin" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: A bug in egen and gen?
Date   Thu, 17 Feb 2011 20:56:25 +0000

Nick,

I do use Stata frequently. I cannot speak for others, but I always save typing where I can. I just fail to see your point " You can get some of that back by -compress-, but not all ". My experiment clearly proves that what matters is the "final" storage data type. I understand that by using double in place of float or long will increase requirement of memory. My point is that computing power is increasing exponentially. For example, any computers I use have at least 4GB of memory. The machine I load with Stata has 8GB. Memory is least of my concerns, but accuracy is always important.

I can see that you are defensive of Stata. But it may not be necessary. I use both SAS and Stata. In fact my formal traing is with SAS and I picked up Stata by using it. But most of my analyses for myself and clients are done with Stata.

Junlin

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Thursday, February 17, 2011 2:28 PM
To: [email protected]
Subject: Re: st: A bug in egen and gen?

You're discovering for yourself what is already documented e.g. at help datatypes. The issue for you remains very simple. Using -double-s wherever you would otherwise use -float-s or -long-s will in respect of those variables double your storage. You can get some of that back by -compress-, but not all. In fact, if you are fairly typical in your Stata use, my guess is not much.
You should also remember that often Stata creates temporary variables on your behalf, so the storage you need may temporarily be much larger than you realise.

Most users really are better off with the default. That is why it is the default. Otherwise, StataCorp would have changed it long since.

Nick

On Thu, Feb 17, 2011 at 8:13 PM, Liao, Junlin <[email protected]> wrote:
> I expanded my experiment to include one more variable that contains a string ("Experiment for testing Stata Options"). The results are the same:
>
> No compress, with option float: 39,064KB No compress, with option
> double: 42,970KB Compressed with either option: 39,064KB
>
> Basically float=long in terms of storage. The option for numeric variables has no impact on string variables - as expected.
>
> Junlin Liao
> Surgery Finance, 1422 JCP
> Phone: (319) 356-2588
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Sarah
> Edgington
> Sent: Thursday, February 17, 2011 2:01 PM
> To: [email protected]
> Subject: RE: st: A bug in egen and gen?
>
> Junlin,
> Have you tried this experiment with something other than a large integer?  I think Nick's point was that you only regain the space using compress if you're dealing exclusively with integers.  You've demonstrated that compressing a double variable in the case where all observations are integers gets you the same storage size as if you'd started out in float.
> Is the same true if all observations are not integers?
> -Sarah
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Liao,
> Junlin
> Sent: Thursday, February 17, 2011 11:53 AM
> To: [email protected]
> Subject: RE: st: A bug in egen and gen?
>
> Nick,
>
> I had experimented with Stata in terms of storage. Here are the results:
>
> I generate one variable with a single value of 83085733 for 1000,000 times.
> The different sizes are
>                                 Original file after -compress- Float
> 3907KB 3907KB (Long) Double                 7813KB 3907KB (Long)
>
> I can see that if the variable is of type double, it requires twice as much storage space comparing to float. The storage space for float is as much as for long. There is no difference after the files are compressed to the final appropriate data type. Therefore, my recommendation for Stata to use double as default calculation and finally select the appropriate type to store data is sensible.
>
> It is also desirable to simply set type to double and compress whenever saving data.
>
> Thanks,
>
> Junlin
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Liao,
> Junlin
> Sent: Thursday, February 17, 2011 1:29 PM
> To: [email protected]
> Subject: RE: st: A bug in egen and gen?
>
> I'm confused here now. Isn't the type of variables determines storage spaces? I'll do some experiments to see your point here. Your attention and quick responses are greatly appreciated.
>
> Junlin
>
>
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Nick Cox
> Sent: Thursday, February 17, 2011 1:23 PM
> To: [email protected]
> Subject: Re: st: A bug in egen and gen?
>
> You're saying, in effect, that nearly doubling storage would not typically bite users. That will be true in some cases but not all.
>
> There is no need to wonder. The help for -save- says there is no such option. But -compress- before -save- is naturally a very good choice and you could program your own wrapper for -save- that always did it.
>
> Here is a sketch:
>
> program jlsave
> version 8
> compress
> save `0'
> end
>
> But if you do what you just said you wanted to do, -set type double-,
> using
> -compress- is not going to give you back more than a fraction of the extra storage you spend. The fraction will depend on how much you deal with strings, always integer variables, etc.
>
> On Thu, Feb 17, 2011 at 7:11 PM, Liao, Junlin <[email protected]> wrote:
>
> Storage wouldn't be a problem if we perform -compress- command regularly.
> I'm wondering if Stata can let you select an option whenever it saves data, it compresses. It will surely be handy to solve this problem.
>
> Nick Cox
>
>> In practice, if StataCorp always warned you of everything that could
>> bite
> you, the help would be much, much longer.
>>
>> Your last suggestion would typically leave -double-s in place unless
>> it so
> happened that the result was integers in every observation. To see why, study Bill Gould's recent postings on the StataCorp blog.
>> That would on average nearly double your storage. If you don't mind,
>> you
> might as follow Stas' suggestion and -set type double-.
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> ________________________________
> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
> ________________________________
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> ________________________________
> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
> ________________________________
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> ________________________________
> Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
> ________________________________
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


________________________________
Notice: This UI Health Care e-mail (including attachments) is covered by the Electronic Communications Privacy Act, 18 U.S.C. 2510-2521, is confidential and may be legally privileged.  If you are not the intended recipient, you are hereby notified that any retention, dissemination, distribution, or copying of this communication is strictly prohibited.  Please reply to the sender that you have received the message in error, then delete it.  Thank you.
________________________________

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index