Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: insheet fails with quotes in data


From   "Friedrich Huebler" <fhuebler@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: insheet fails with quotes in data
Date   Fri, 18 Apr 2008 22:54:27 -0400

I work with tab-delimited text files with string variables that
sometimes contain quote marks. If the quotes appear in pairs, the data
is imported but the quotes are stripped from the data. When a string
contains a single quote mark (i.e., a quote mark not followed by a
second quote mark), Stata fills that particular variable up to the
maximum string length of 244 characters and then stops the import so
that all remaining data from the original file is ignored. The problem
can be reproduced with these three test files:

test1.txt:
row11	row12	row13
row21	row22	row23
row31	row32	row33

test2.txt:
row11	row12	row13
row21	"row"22	row23
row31	row32	row33

test3.txt:
row11	row12	row13
row21	row"22	row23
row31	row32	row33

Each file has three lines of text, and each line has three strings
that are separated by tabs. test1.txt is a tab-delimited text files
without quotes; this file can be imported without problems. test2.txt
is a tab-delimited text files with a pair of quotes; the file is
imported but the quotes are removed. test3.txt has a single quote mark
and -insheet- fails. One of my text files has 95,000 lines and only
the first 918 lines are imported because of a single quote mark in
line 918.

. insheet using test1.txt, clear tab nonames
(3 vars, 3 obs)
. clist
            v1         v2         v3
  1.     row11      row12      row13
  2.     row21      row22      row23
  3.     row31      row32      row33

. insheet using test2.txt, clear tab nonames
(3 vars, 3 obs)
. clist
            v1         v2         v3
  1.     row11      row12      row13
  2.     row21      row22      row23
  3.     row31      row32      row33

. insheet using test3.txt, clear tab nonames
(3 vars, 2 obs)
. clist
            v1                           v2         v3
  1.     row11                        row12      row13
  2.     row21  22      row23 row31     row32   row33

How can the data be imported into Stata with all observations and
preferably also with quotes, either single or in pairs? I can open the
files in a text editor and look for quotes that do not appear in pairs
to remove them manually, but this is inefficient and changes the
original data.

Thanks,

Friedrich
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index