Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Daniel Marcelino <dmsilva.br@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Problem with infix: record too long |

Date |
Tue, 26 Apr 2011 03:17:05 -0300 |

Maybe you can try open your file using softwares like TextMate. I have ben used it to open every kind of text file. Including large files as 500 MB. So, worth a try. Daniel On Mon, Apr 25, 2011 at 9:11 PM, Nick Cox <njcoxstata@gmail.com> wrote: > Fixed format or not, I can't see a way for Stata to make sense out of that. > > It's not uncommon for datafiles to start with some kind of preamble. > But this seems to start with some data. Also, the end looks quite > unlike the beginning, as might be guessed from the -hexdump- report. > > Unless you can give more information on what should be inside -- > you've not said, but you should know -- or someone recognises this > stuff, I think you need to ask those people what kind of beast they > sent. > > 2011/4/26 Barbara Guimarães <barbara.vgh@gmail.com>: >> Nick, thanks for your response. >> >> Using the type filename.txt as you suggested, Stata showed me the >> following first lines: >> >> type TS_QUEST_ALUNO.txt >> 1373262421RN24GROSSOS >> 2404408ADCDBAAACABDCABCEAAAAAAAAAAAAAAC*CBAABAAAAAA >> 1373263421RN24GROSSOS >> 2404408BDKEAAAABABDCADBDACAAAB.AAAAAAAABBBBACAAAAAA >> 1373264421RN24GROSSOS >> 2404408BAAACBAAB..DCADCDAAAAAABBAAAAAAAAABCAAAA.A.. >> >> and which than ended as: >> >>> ...................................................................................................................................................................................................... >>> ............................................................................................................................................................c4 ......?.:Z3. >> .R...x.9..........T.Np(0$%'...@#../q..'!m.t.F2$*J >> >> It looks like, to me, that this would be a fixed format. But I might be wrong. >> >> regards, >> Barbara >> >> 2011/4/24 Nick Cox <njcoxstata@gmail.com>: >>> Your last question is, in effect, can I explain to you how to read a >>> binary file with unspecified structure into Stata, and the short >>> answer is sorry, no. >>> >>> It's a rare word processor that can open large binary files with >>> success. Word processors accept a range of formats for documents, >>> tending to prefer their own proprietary format, but are usually >>> useless at reading binary data files. A good text editor could do it; >>> that does not include the proprietary editors bundled with MS Windows. >>> >>> I wonder if you are being misled by the first line in the help for >>> -infix- below, while overlooking the second line, which is vital. >>> >>> "infix reads into memory from a disk dataset that is not in Stata >>> format. infix requires >>> that the data be in fixed-column format." >>> >>> As you reported, Stata is seeing far fewer end-of-line character pairs >>> \r\n than lines in this file, \r and \n characters are occurring by >>> themselves, which is not standard for text files in MS Windows, and >>> -hexdump- is labelling this binary. It' s unlikely to be wrong on >>> that. >>> >>> You could try just >>> >>> . type filename.txt >>> >>> in Stata and that might show you, and us, the first few lines of the >>> file. They might be recognisable to someone as in a particular format. >>> >>> I think if you can't get an idea of what the structure of this file >>> is, then you have no way to read it into Stata. Why a "government >>> organisation" is providing a binary file and calling a .txt I cannot >>> explain. You may need to talk to them. >>> >>> Nick >>> >>> 2011/4/24 Barbara Guimarães <barbara.vgh@gmail.com>: >>>> Dear Nick, unfortunetly, I'm not being able to open the file with any >>>> word processor (I believe that it is because of its size / this >>>> dataset was provided by an government organization, so I already >>>> received it in .txt format and don't have access to the primary data) >>>> >>>> >>>> However, the output of the hexdump analyze was: >>>> >>>> >>>>>> . hexdump TS_QUEST_ALUNO.txt, analyze >>>> >>>> >>>> Line-end characters Line >>>> length (tab=1) >>>> >>>> \r\n (Windows) 2,517,361 >>>> minimum 0 >>>> >>>> \r by itself (Mac) 686,626 >>>> maximum 20,971,542 >>>> >>>> \n by itself (Unix) 768,441 >>>> >>>> Space/separator characters Number of >>>> lines 3,972,429 >>>> >>>> [blank] 112,067,613 >>>> EOL at EOF? no >>>> >>>> [tab] 707,187 >>>> >>>> [comma] (,) 765,547 Length >>>> of first 5 lines >>>> >>>> Control characters >>>> Line 1 120 >>>> >>>> binary 0 30,611,037 >>>> Line 2 120 >>>> >>>> CTL excl. \r, \n, \t 19,330,367 >>>> Line 3 120 >>>> >>>> DEL 367,820 >>>> Line 4 120 >>>> >>>> Extended (128-159,255) 21,370,596 Line 5 >>>> 120 >>>> >>>> ASCII printable >>>> >>>> A-Z 149,642,323 >>>> >>>> a-z 16,234,081 >>>> File format BINARY >>>> >>>> 0-9 53,967,247 >>>> >>>> Special (!@#$ etc.) 28,963,365 >>>> >>>> Extended (160-254) 54,882,559 >>>> >>>> --------------- >>>> >>>> Total 495,399,531 >>>> >>>> >>>> >>>> Observed were: >>>> >>>> \0 ^A ^B ^C ^D ^E ^F ^G ^H \t \n ^K ^L \r ^N ^O ^P ^Q ^R ^S ^T ^U ^V ^W >>>> >>>> ^X ^Y ^Z Esc 28 29 30 31 blank ! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 >>>> >>>> 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y >>>> >>>> Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } >>>> >>>> ~ DEL 128 E^A E^B E^C E^D E^E E^F E^G E^H E^I E^J E^K E^L E^M E^N E^O >>>> >>>> E^P E^Q E^R E^S E^T E^U E^V E^W E^X E^Y E^Z 155 156 157 158 159 160 ¡ ¢ >>>> >>>> £ ¤ ¥ ¦ § ¨ © ª « ¬ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ >>>> >>>> Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê >>>> >>>> ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ 255 >>>> >>>> >>>> Is there any way I could transform this dataset in a way Stata would >>>> read it entirely? >>>> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Daniel Marcelino http://danielmarcelino.zip.net Skype: dmsilva.br * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Problem with infix: record too long***From:*Barbara Guimarães <barbara.vgh@gmail.com>

**Re: st: Problem with infix: record too long***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Problem with infix: record too long***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Problem with infix: record too long***From:*Barbara Guimarães <barbara.vgh@gmail.com>

**Re: st: Problem with infix: record too long***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: Problem with infix: record too long***From:*Barbara Guimarães <barbara.vgh@gmail.com>

**Re: st: Problem with infix: record too long***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**st: -writeinput- available from SSC Archives** - Next by Date:
**st: Box-Tidwell Test** - Previous by thread:
**Re: st: Problem with infix: record too long** - Next by thread:
**RE: st: Problem with infix: record too long** - Index(es):