Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problem with Stata handling of large dataset


From   Scott Merryman <[email protected]>
To   [email protected]
Subject   Re: st: Problem with Stata handling of large dataset
Date   Mon, 5 Aug 2013 08:59:09 -0500

On Mon, Aug 5, 2013 at 8:42 AM, Palan, Stefan
([email protected]) <[email protected]> wrote:
> Hi everybody,
>
> I have noticed a problem with Stata (SE 12.1, 64 bit) when working with large datasets. When I type the following:
>
>
> ----------------------------------------------------------------------
> clear
> set obs 63000000
> gen long id=_n
> gen long y=int(id/5)
> gen long z=int((id-1000)/5)
> gen long yz=y-z
> sum yz
> ----------------------------------------------------------------------
>
>
> I get the following output:
>
>
> ----------------------------------------------------------------------
>     Variable |       Obs        Mean    Std. Dev.       Min        Max
> -------------+--------------------------------------------------------
>           yz |  63000000         200    .0035635        199        200
> ----------------------------------------------------------------------
>
>
> Shouldn't the standard deviation be zero, and min equal max equal mean?
>

No.  The   -int()- function truncates the value towards 0.  So when id = 1,
y= int(id/5) = int(.2) =0
z =int((id-1000)/5) = int(-199.8) = -199
yz = y - z = 199

When id =20
y= int(id/5) = int(4) =4
z =int((id-1000)/5) = int(-196) = -196
yz = y -z = 200

Scott
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index