Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Problem with infix: record too long


From   Barbara Guimarães <barbara.vgh@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Problem with infix: record too long
Date   Sun, 24 Apr 2011 17:36:14 -0300

Dear Nick, unfortunetly, I'm not being able to open the file with any
word processor (I believe that it is because of its size / this
dataset was provided by an government organization, so I already
received it in .txt format and don't have access to the primary data)

However, the output of the hexdump analyze was:

>> hexdump TS_QUEST_ALUNO.txt, analyze

  Line-end characters
Line length (tab=1)
    \r\n        (Windows)                     2,517,361
minimum                        0
    \r by itself (Mac)                             686,626
   maximum               20,971,542
    \n by itself (Unix)                            768,441

  Space/separator characters                                  Number
of lines                3,972,429
    [blank]                                      112,067,613
  EOL at EOF?                         no
    [tab]                                                707,187
    [comma] (,)                                     765,547
 Length of first 5 lines

  Control characters
     Line1                       120
    binary 0                                     30,611,037
    Line2                       120
    CTL excl. \r, \n, \t                      19,330,367
Line 3                       120
    DEL                                               367,820
      Line 4                       120
    Extended (128-159,255)          21,370,596                Line 5
                    120

  ASCII printable
    A-Z                                         149,642,323
    a-z                                            16,234,081
     File format                 BINARY
    0-9                                            53,967,247
    Special (!@#$ etc.)                  28,963,365
    Extended (160-254)                 54,882,559
                          ---------------
  Total                                         495,399,531


  Observed were:
     \0 ^A ^B ^C ^D^E ^F ^G ^H \t \n ^K ^L \r ^N ^O ^P ^Q ^R ^S ^T ^U ^V ^W
     ^X ^Y ^Z Esc28 29 30 31 blank ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5
     6 7 8 9 : ;< = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y
     Z [ \ ] ^ _ `a b c d e f g h i j k l m n o p q r s t u v w x y z { | }
     ~ DEL 128 E^AE^B E^C E^D E^E E^F E^G E^H E^I E^J E^K E^L E^M E^N E^O
     E^P E^Q E^RE^S E^T E^U E^V E^W E^X E^Y E^Z 155 156 157 158 159 160 ¡ ¢
     £ ¤ ¥ ¦ § ¨ ©ª « ¬ ­ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ
     Ç È É Ê Ë Ì ÍÎ Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê
     ë ì í î ï ð ñò ó ô õ ö ÷ ø ù ú û ü ý þ 255

Is there any way I could transform this dataset in a way Stata would
read it entirely?

 thank you

Barbara


-infix- cannot read binary files.

.txt is just a file extension that is commonly assigned to ASCII
files; if the file is binary, calling it .txt won't make it so.

Showing us the first few lines of the file might give us a clue.

Nick


> 2011/4/24 Barbara Guimarães <barbara.vgh@gmail.com>:
>


>> I'm having trouble reading data with infix. I am using Stata 11.1 for Windows.
>>
>> I received the "record too long" error, and I subsequently read the
>> thread "Re:st:reading data with infix: record too long" (as of 2003
>> and 2008), then checked my data set with -hexdump, analyze- but I
>> don't know what to do now.
>>
>> The number of lines in my dataset is 3,972,429, but Stata is only
>> reading 2,513,116 lines. I have also tried to open the dataset using a
>> text editor to see if I could do something about it (my data set
>> format is .TXT), but its not opening due to its size.
>> I have alos tested reading the dataset wiht only one or two variables,
>> but in all cases Stata only brings part of the sata set and I receive
>> the "record too long" error.
>>
>> I do believe it is not a memory issue, because I had previously read
>> another dataset in Stata, which was as large as this one, with no
>> problems. The dataset I'm having trouble with reading is in Binary
>> format, and I don't know if this is the cause of the error or not (the
>> other dataset is ASCII format).
>>
>> What are the possibilities, if any, to integraly read this file in
>> Stata? I would need to merge both datasets afterwards, would it be
>> possible?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index