Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Getting rid of binary codes so I can read in files - reposted


From   Austin Nichols <[email protected]>
To   [email protected]
Subject   Re: st: Getting rid of binary codes so I can read in files - reposted
Date   Wed, 18 Jan 2012 14:02:02 -0500

Orian Brook <[email protected]>:
Looks like two bugs in -filefilter- to me; \00h in the from() option
looks empty, as you will see if you try to write to a location where
file2 cannot be saved. Stata will tell you:

(from() option is empty, therefore whole operation is irrelevant;
 input file will be copied to output file)

before telling you it cannot save file2.  If you specify an empty
from() and to() option, Stata freezes up.

On Wed, Jan 18, 2012 at 9:40 AM, Orian Brook <[email protected]> wrote:
> Not lucky enough to have had any replies so far - is there anyone with any
> suggestions, or shall I just revert to Outlook?
> Thanks
> Orian
>
> Dear all
> I'm analysing administrative data which I've had to export using an online
> database into 105 files. I've previously worked with similar files by
> importing and combining them all in Outlook, then reading into stata using
> an odbc link, but I'd really like to try to do it all in stata (so I have
> the do file for repetition/audit trail purposes) but I have some problems.
> The original files has extra EOL characters, and extended ones, which I can
> get rid of using filefilter, but I still can't import the file: using
> insheet I get the correct number of rows and columns, but all cells are
> blank except the first (it has a t in it). I've also tried using infile and
> skipping the first line, to no avail. Running hexdump shows that I have over
> 2million binary 0s, which I think may be the problem? I tried using the
> command "filefilter file1 file2, from(\00hd) to() replace" to get rid of
> them, but it hangs.
>
> Any help would be very gratefully received. The hexdump is below.
> (apologies, plain text format doesn't allow me to post this in courier or
> something more legible)
>
> Regards
> Orian Brook
>
>   Line-end characters                        Line length (tab=1)
>     \r\n         (Windows)         26,823      minimum 2
>     \r by itself (Mac)                  0      maximum 403
>     \n by itself (Unix)                 0
>   Space/separator characters                 Number of lines 26,824
>     [blank]                       107,191      EOL at EOF? no
>     [tab]                               0
>     [comma] (,)                   509,637    Length of first 5 lines
>   Control characters                           Line 1 403
>     binary 0                    2,747,580      Line 2 185
>     CTL excl. \r, \n, \t                0      Line 3 243
>     DEL                                 0      Line 4 245
>     Extended (128-159,255)              0      Line 5 245
>   ASCII printable
>     A-Z                           189,766
>     a-z                           189,754    File format BINARY
>     0-9                         1,509,729
>     Special (!@#$ etc.)           187,857
>     Extended (160-254)                  0
>                           ---------------
>   Total                         5,495,160
>   Observed were:
>      \0 \n \r blank , - . / 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N
> O
>      P Q R S T U V W X Y Z _ a b c d e f g h i k l m n o p q r s t u v x y

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index