Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: importing LONG string variables


From   "Friedrich Huebler" <fhuebler@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: importing LONG string variables
Date   Fri, 24 Aug 2007 14:00:05 -0400

Denisa,

Is your example an accurate representation of your data? If so, you
have a problem because there are no delimiters around fields with
missing data. Here is a partial answer to your question that will read
the data into Stata, but the columns won't line up.

Step 1: Open the file in a text editor and replace all occurrences of
" comma " by "|" (without quotes). This will yield the following file:

Row1
Name1|Name2|Address1|Address2|PatClass1|PatClass2|PatClass3
Row 2
Name3|Name4|Name5|Address3|Address4|Address5|PatClass4

Step 2: Read the file into Stata with -insheet-

. insheet using test.txt, delimit("|")
. clist, noobs

   v1         v2         v3         v4         v5         v6         v7
 Row1
Name1      Name2   Address1   Address2  PatClass1  PatClass2  PatClass3
Row 2
Name3      Name4      Name5   Address3   Address4   Address5  PatClass4

Step 3: Delete the "Row" entries.

. drop if mod(_n,2)>0
(2 observations deleted)

. clist, noobs

   v1         v2         v3         v4         v5         v6         v7
Name1      Name2   Address1   Address2  PatClass1  PatClass2  PatClass3
Name3      Name4      Name5   Address3   Address4   Address5  PatClass4

Step 4: Save the data as a comma-separated file.

. outsheet using test.csv, comma

When you open the CSV file in a text editor you see this:

v1,v2,v3,v4,v5,v6,v7
"Name1","Name2","Address1","Address2","PatClass1","PatClass2","PatClass3"
"Name3","Name4","Name5","Address3","Address4","Address5","PatClass4"

Variable v3 should have a missing value in the first observation.
Instead it contains Address1. Variables v4 to v7 also contain wrong
data. I do not know how you can address this problem without
information on missing values in your original data.

Friedrich

On 8/23/07, Mindruta, Denisa Constanta <mindruta@uiuc.edu> wrote:
> Greetings!
> I would appreciate any help on the following problem: I need to import a (.cvs) file containing several string variables that go well beyond stata limits. Is there a way to import the file, and at the same time, parse these string variables in constituent words (delimited by "|") before saving it as a stata file ?
>
> A simple example might help:
> Row1
> Name1|Name2 comma Address1|Address2 comma PatClass1|PatClass2|PatClass3
> Row 2
> Name3|Name4|Name5 comma Address3|Address4|Address5 comma PatClass4
>
> Want to get the following structure:
> Row1
> Name1 comma Name2 comma "missing info" comma Address1 comma Address2 comma "missing info" comma PatClass1 comma PatClass2 comma PatClass3
> Row 2
> Name3 comma Name4 comma Name5 comma Address3 comma Address4 comma Address5 comma PatClass4 comma "missing info" comma "missing info"
>
> Any suggestion on how to approach this problem? (here is just a simpe example, the text in a cell could go up to 200 words of 30 characters each, and I have 15 of these variables, and 600 files...)Thanks !
>
> Denisa
> University of Illinois Urbana-Champaign
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index