Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Does -insheet- read data incorrectly?


From   Dirk Enzmann <dirk.enzmann@uni-hamburg.de>
To   statalist@hsphsun2.harvard.edu
Subject   st: Does -insheet- read data incorrectly?
Date   Thu, 12 Mar 2009 22:26:18 +0100

I encountered the following problem:

I'm using the following command to import the data of a tab-delimited text file into Stata:

--------------------------------------------------------------------
insheet using "file.txt", tab clear
--------------------------------------------------------------------

"file.txt" contains data delimited by tabs, the first row contains the following names of the variables (also separated by tabs):

--------------------------------------------------------------------
recfile time LfdNr field note
--------------------------------------------------------------------

Except for "LfdNr" all variables should be string variables.

In each row the "values" (better: "columns") are separated by four tabs. An example of the data of a row is as follows (to show how the data look like, in this mail I separate each "column" of the row by using a line break, in the data file they are separated by tabs, of course):

--------------------------------------------------------------------
D:\DATENEINGABE\HH08\HH08_SF9_05.REC
20 Dez 2008 15:43
570
vermnb
.; #2-3
--------------------------------------------------------------------

The problem: In some rows the last "column" (here containing ".; #2-3") contains double quotes ("), but sometimes they don't occur in pairs enclosing other characters but as lonesome singles. If this is the case, -insheet- does not start the new case with the new row of data but continues to read the data of the text-file into the variable "note". Only if again a single double quote occurs in a row of data, -insheet- continues to create new cases by reading new rows.

For example, if a row contains the following data (again, in this mail separated by line breaks instead of tabs to show clearly how the data look like):

--------------------------------------------------------------------
D:\DATENEINGABE\HH08\HH08_SF9_05.REC
13 Dez 2008 14:37
325
glaeubig
97; "#4-5
--------------------------------------------------------------------

ignoring line breaks or tabs all data of the text file starting with "97;" #4-5" will be read into the variable "note" until another line of the text file contains a string with only one double quote, such as

--------------------------------------------------------------------
D:\DATENEINGABE\HH08\HH08_SF9_05.REC
15 Dez 2008 14:05
373
beten
.; "2-3
--------------------------------------------------------------------

(of course, the length of the string variable "note" will automatically be restricted to 244 and everything which exceeds this will be lost, but this is not the issue).

To my mind a tab-delimited file is a tab-delimited file, i.e. data wil be read as *separated* by tabs (and/or line-breaks). Obviously, -insheet- does not respect the tabs as delimiters in all instances.

Is this a correct behavior of -insheet- which I don't understand correctly or is it a bug? What should I do if it is the former?

Yours,
Dirk

*************************************************
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Schlueterstr. 28
D-20146 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
       +49-(0)40-42838.4591 (Mrs Billon)
fax:   +49-(0)40-42838.2344
email: dirk.enzmann@uni-hamburg.de
www: http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
*************************************************
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index