Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Getting rid of binary codes so I can read in files - reposted

From	"David Radwin" <[email protected]>
To	<[email protected]>
Subject	st: RE: Getting rid of binary codes so I can read in files - reposted
Date	Wed, 18 Jan 2012 08:32:32 -0800 (PST)

Orian,

I've never used it myself, but you might try Google Refine:

http://www.stata.com/statalist/archive/2010-11/msg00858.html

http://code.google.com/p/google-refine/

Please let us know if it works for you or not.

David
--
David Radwin
Research Associate
MPR Associates, Inc.
2150 Shattuck Ave., Suite 800
Berkeley, CA 94704
Phone: 510-849-4942
Fax: 510-849-0794

www.mprinc.com


> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Orian Brook
> Sent: Wednesday, January 18, 2012 6:40 AM
> To: [email protected]
> Subject: st: Getting rid of binary codes so I can read in files -
reposted
>
> Not lucky enough to have had any replies so far - is there anyone with
any
> suggestions, or shall I just revert to Outlook?
> Thanks
> Orian
>
> Dear all
> I'm analysing administrative data which I've had to export using an
online
> database into 105 files. I've previously worked with similar files by
> importing and combining them all in Outlook, then reading into stata
using
> an odbc link, but I'd really like to try to do it all in stata (so I
have
> the do file for repetition/audit trail purposes) but I have some
problems.
> The original files has extra EOL characters, and extended ones, which I
> can
> get rid of using filefilter, but I still can't import the file: using
> insheet I get the correct number of rows and columns, but all cells are
> blank except the first (it has a t in it). I've also tried using infile
> and
> skipping the first line, to no avail. Running hexdump shows that I have
> over
> 2million binary 0s, which I think may be the problem? I tried using the
> command "filefilter file1 file2, from(\00hd) to() replace" to get rid of
> them, but it hangs.
>
> Any help would be very gratefully received. The hexdump is below.
> (apologies, plain text format doesn't allow me to post this in courier
or
> something more legible)
>
> Regards
> Orian Brook
>
>   Line-end characters                        Line length (tab=1)
>     \r\n         (Windows)         26,823      minimum 2
>     \r by itself (Mac)                  0      maximum 403
>     \n by itself (Unix)                 0
>   Space/separator characters                 Number of lines 26,824
>     [blank]                       107,191      EOL at EOF? no
>     [tab]                               0
>     [comma] (,)                   509,637    Length of first 5 lines
>   Control characters                           Line 1 403
>     binary 0                    2,747,580      Line 2 185
>     CTL excl. \r, \n, \t                0      Line 3 243
>     DEL                                 0      Line 4 245
>     Extended (128-159,255)              0      Line 5 245
>   ASCII printable
>     A-Z                           189,766
>     a-z                           189,754    File format BINARY
>     0-9                         1,509,729
>     Special (!@#$ etc.)           187,857
>     Extended (160-254)                  0
>                           ---------------
>   Total                         5,495,160
>   Observed were:
>      \0 \n \r blank , - . / 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L
M
> N
> O
>      P Q R S T U V W X Y Z _ a b c d e f g h i k l m n o p q r s t u v x
y

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Getting rid of binary codes so I can read in files - reposted
  - From: "Orian Brook" <[email protected]>

Prev by Date: st: Problems with ci_marg_mu and ordinal variable
Next by Date: Re: RE: st: Spurious inference from endogeneity tests
Previous by thread: st: Getting rid of binary codes so I can read in files - reposted
Next by thread: Re: st: Getting rid of binary codes so I can read in files - reposted
Index(es):
- Date
- Thread