Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: problem reading text data into stata


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: problem reading text data into stata
Date   Mon, 5 Dec 2011 12:45:08 +0000

I just copied this and did a character count with -hexdump-. It seems as if you have some possibly problematic characters there. 
In particular, 

     133  85  E^E                 1
     145  91  E^Q                 4
     146  92  E^R               461
     150  96  E^V                 1
     160  a0  160                 1
     225  e1  á                   1

Otherwise, as you give no commands and no definition of what would be correct, it is difficult to comment on what you should do. But if this were my problem, I would be looking for those characters with a decent text editor. 

. hexdump textdata.txt, tabulate

  Line-end characters                        Line length (tab=1)
    \r\n         (Windows)         19,999      minimum                        1
    \r by itself (Mac)             19,999      maximum                      344
    \n by itself (Unix)                 0
  Space/separator characters                 Number of lines             39,998
    [blank]                       602,317      EOL at EOF?                  yes
    [tab]                         179,991
    [comma] (,)                     1,092    Length of first 5 lines
  Control characters                           Line 1                        79
    binary 0                            0      Line 2                         1
    CTL excl. \r, \n, \t                0      Line 3                       126
    DEL                                 0      Line 4                         1
    Extended (128-159,255)            467      Line 5                       127
  ASCII printable
    A-Z                           218,837
    a-z                           786,411    File format                 BINARY
    0-9                           582,152
    Special (!@#$ etc.)           172,669
    Extended (160-254)                  2
                          ---------------
  Total                         2,603,935

  Observed were:
     \t \n \r blank ! " # $ & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < > ?
     A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ ] a b c d e f g h
     i j k l m n o p q r s t u v w x y z { } E^E E^Q E^R E^V 160 á

  Tabulation (character not listed if unobserved):
     Dec Hex  Char        Frequency
     ------------------------------
     009  09  \t            179,991
     010  0a  \n             19,999
     013  0d  \r             39,998
     032  20  blank         602,317
     033  21  !                  33
     034  22  "               6,251
     035  23  #                   6
     036  24  $                   2
     038  26  &                 564
     039  27  '                 733
     040  28  (               9,062
     041  29  )               9,054
     042  2a  *                   5
     043  2b  +                   1
     044  2c  ,               1,092
     045  2d  -               2,486
     046  2e  .               4,850
     047  2f  /              51,369
     048  30  0             221,441
     049  31  1              66,881
     050  32  2              85,899
     051  33  3              71,445
     052  34  4              29,256
     053  35  5              41,863
     054  36  6              18,235
     055  37  7              14,778
     056  38  8              14,031
     057  39  9              18,323
     058  3a  :              69,648
     059  3b  ;              18,494
     060  3c  <                   6
     062  3e  >                   6
     063  3f  ?                  15
     065  41  A              11,598
     066  42  B              20,285
     067  43  C              34,650
     068  44  D               3,676
     069  45  E              20,423
     070  46  F               2,984
     071  47  G               3,145
     072  48  H               2,164
     073  49  I              10,586
     074  4a  J               1,782
     075  4b  K               1,202
     076  4c  L               3,880
     077  4d  M               6,063
     078  4e  N              42,371
     079  4f  O               2,161
     080  50  P               8,361
     081  51  Q                 276
     082  52  R               4,663
     083  53  S              18,057
     084  54  T               6,408
     085  55  U               4,726
     086  56  V               1,495
     087  57  W               5,610
     088  58  X                 695
     089  59  Y               1,218
     090  5a  Z                 358
     091  5b  [                   2
     093  5d  ]                   4
     097  61  a              57,880
     098  62  b               4,090
     099  63  c              22,449
     100  64  d              16,106
     101  65  e             103,153
     102  66  f               4,041
     103  67  g              35,055
     104  68  h              13,928
     105  69  i              69,629
     106  6a  j                 107
     107  6b  k               5,893
     108  6c  l              28,815
     109  6d  m              45,755
     110  6e  n              77,699
     111  6f  o              52,881
     112  70  p              30,897
     113  71  q               3,446
     114  72  r              53,480
     115  73  s              46,056
     116  74  t              38,425
     117  75  u              15,202
     118  76  v              25,072
     119  77  w              26,044
     120  78  x               2,300
     121  79  y               6,875
     122  7a  z               1,133
     123  7b  {                  39
     125  7d  }                  39
     133  85  E^E                 1
     145  91  E^Q                 4
     146  92  E^R               461
     150  96  E^V                 1
     160  a0  160                 1
     225  e1  á                   1
     ------------------------------
     Total                2,603,935

Nick 
[email protected] 


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of David Stromberg
Sent: 05 December 2011 11:18
To: [email protected]
Subject: st: problem reading text data into stata

On  a number of occasions, I have had problems reading tab-delimited 
text (string) data into Stata. For example, I cannot get stata to 
correctly read the tab-separated text file at
http://people.su.se/~dstro/textdata.txt

I tried opening it in Excel and resaving, saving as csv, identifying and
eliminating characters which makes Stata misread, etc. Either some lines
are missing, or some text is incorrect, e.g. text within parenthesis.

Any ideas no how to proceed?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index