Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Getting rid of binary codes so I can read in files


From   "Orian Brook" <ob11@st-andrews.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Getting rid of binary codes so I can read in files
Date   Fri, 13 Jan 2012 13:16:15 -0000

Dear all
I'm analysing administrative data which I've had to export using an online
database into 105 files. I've previously worked with similar files by
importing and combining them all in Outlook, then reading into stata using
an odbc link, but I'd really like to try to do it all in stata (so I have
the do file for repetition/audit trail purposes) but I have some problems.
The original files has extra EOL characters, and extended ones, which I can
get rid of using filefilter, but I still can't import the file: using
insheet I get the correct number of rows and columns, but all cells are
blank except the first (it has a t in it). I've also tried using infile and
skipping the first line, to no avail. Running hexdump shows that I have over
2million binary 0s, which I think may be the problem? I tried using the
command "filefilter file1 file2, from(\00hd) to() replace" to get rid of
them, but it hangs. 

Any help would be very gratefully received. The hexdump is below.
(apologies, plain text format doesn't allow me to post this in courier or
something more legible)

Regards
Orian Brook

  Line-end characters                        Line length (tab=1)
    \r\n         (Windows)         26,823      minimum 2
    \r by itself (Mac)                  0      maximum 403
    \n by itself (Unix)                 0
  Space/separator characters                 Number of lines 26,824
    [blank]                       107,191      EOL at EOF? no
    [tab]                               0
    [comma] (,)                   509,637    Length of first 5 lines 
  Control characters                           Line 1 403
    binary 0                    2,747,580      Line 2 185
    CTL excl. \r, \n, \t                0      Line 3 243
    DEL                                 0      Line 4 245
    Extended (128-159,255)              0      Line 5 245
  ASCII printable
    A-Z                           189,766
    a-z                           189,754    File format BINARY
    0-9                         1,509,729
    Special (!@#$ etc.)           187,857
    Extended (160-254)                  0
                          ---------------
  Total                         5,495,160
  Observed were:
     \0 \n \r blank , - . / 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N
O
     P Q R S T U V W X Y Z _ a b c d e f g h i k l m n o p q r s t u v x y

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index