Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Error in chunky


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Error in chunky
Date   Mon, 28 Nov 2011 19:05:48 +0000

No solution, just a suggestion to explain that -chunky- is from SSC.

Also, another suggestion: better to say "error in using -chunky-"
rather than "error in -chunky-", which leaves open the possibility
that the problem lies in the file or in the user's syntax, not
necessarily with the programmer.

Nick

On Mon, Nov 28, 2011 at 2:56 PM, King, Carina <c.king09@imperial.ac.uk> wrote:
> Dear Statalist,
>
> I am having some issues with a very large file (about 8GB). I am using 'chunky' to attempt to split it into smaller files but it keeps coming back with an error:
>
>               ftell(): 2144826048  Stata returned error
>             chunkfile():     -  function returned error
>                 <istmt>:     -  function returned error [1]
> r(2144826048);
>
>
> I have tried setting the chunked file size to different sizes, starting at 1G going down to 10M and the error comes up each time but at a different position. I have also tried it with a .txt file and a .csv file and it again comes up with the same error in both. I have put below the 'analyze' results from the chunky programme and the memory allocation that my STATA is set to. Any help on what the error is or how to solve it / suggestions on how to open this file would be much appreciated!
>
> Analyzing D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv for chunking
>
> BINARY is the file type
> File has 6352205 lines of average length 1282 bytes
> Composition is 11% letters, 56.00000000000001% numbers and 34% other characters
> No extended characters present.
>
> Approximate chunk sizes and memory requirements
> for -insheet- or -infile- commands
> +-----------------------------------------------------------+
> |Chunksize (mb)|  Number of   |   ~Number    | Stata size*  |
> |    option    |    Chunks    |  obs/chunk   | (megabytes)  |
> |--------------+--------------+--------------+--------------|
> |          10  |         815  |        7794  |         5.9  |
> |          30  |         272  |       23354  |        17.7  |
> |         100  |          82  |       77466  |        58.7  |
> |         300  |          28  |      226864  |       171.8  |
> |        1000  |           9  |      705801  |       534.6  |
> |        3000  |           3  |     2117402  |      1603.7  |
> +-----------------------------------------------------------+
> * Stata file size is very approximate and depends on datatypes of variables
>
>
> . hexdump `"D:\Carina\New NEW dat files\hashed09_new_csv\hashed09_new_csv.csv"', analyze results
>
>  Line-end characters                        Line length (tab=1)
>    \r\n         (Windows)      6,352,205      minimum                      568
>    \r by itself (Mac)                  0      maximum                    2,631
>    \n by itself (Unix)                 0
>  Space/separator characters                 Number of lines          6,352,205
>    [blank]                   608,462,578      EOL at EOF?                  yes
>    [tab]                               0
>    [comma] (,)             1,255,193,981    Length of first 5 lines
>  Control characters                           Line 1                     2,631
>    binary 0                           12      Line 2                     1,058
>    CTL excl. \r, \n, \t                1      Line 3                     1,053
>    DEL                                 0      Line 4                     1,042
>    Extended (128-159,255)              0      Line 5                     1,052
>  ASCII printable
>    A-Z                       647,742,983
>    a-z                       219,580,832    File format                 BINARY
>    0-9                     4,546,540,695
>    Special (!@#$ etc.)       854,705,763
>    Extended (160-254)                  0
>                          ---------------
>  Total                     8,144,931,255
>
>  Observed were:
>     \0 ^C \n \r blank " % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = >
>     ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] ^ _ a b c d
>     e f g h i j k l m n o p q r s t u v w x y z { } ~
>
>
>
> Current memory allocation
>
>                    current                                 memory usage
>    settable          value     description                 (1M = 1024k)
>    --------------------------------------------------------------------
>    set maxvar         5000     max. variables allowed           1.947M
>    set memory        10000M    max. data space             10,000.000M
>    set matsize       11000     max. RHS vars in models        924.080M
>                                                            -----------
>                                                            10,926.027M

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index