Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: importing "irregular" columns with embedded blanks


From   "Dr. Buzz Burhans" <[email protected]>
To   <[email protected]>
Subject   st: importing "irregular" columns with embedded blanks
Date   Tue, 23 Jan 2007 21:01:16 -0500

Dear Stata gurus:

I have data that comes to me in a text file that is generally columnar, but
does not contain any delimiters.  If I import the file into Stata it comes
in as a single variable v1 ; the first 10 observations look something like
this (assuming the email doesn't mangle it here):
 

. list v1

     +----------------------------------------------------------------------
     |                                                                   v1 
     |----------------------------------------------------------------------
  1. |  ANY 1206 Bunk1 Corn Silage      $0.00     97    FarmGrown Forag   lb
  2. |      1234 Shed1 Grass Hay        $0.00    146    Purchased Forag   lb
  3. |      1582 Purch1 Straw           $0.00    164    Purchased Forag   lb
  4. |      1237 Shor1 Haylage 1st cut  $0.00    149    FarmGrown Forag   lb
  5. |      1238 Bunk4 Haylage 2nd cut  $0.00    150    FarmGrown Forag   lb
     |----------------------------------------------------------------------
  6. |      1070 CornGrainGrndFine    $135.00    205    Purchased Energ   lb
  7. |       039 BakeryByProdBread    $130.00    308    Purchased Energ   lb
  8. |      1052 BeetPulpPlCp         $140.00    313    Purchased Energ   lb
  9. |      1022 EnergyBooster        $1,200.    521    Purchased Energ   lb
 10. |                                                                    00



Note that in some columns there is text for only some observations; a
section which is "names" has variable numbers of "words" in each name, etc.

When I try to infile this file using a dictionary file, I am not able to
because there are embedded blanks where all the spaces are.  I also tried to
slice out substrings or words using string functions, but the embedded
spaces cause different "columns" to be picked up in different observations.


I have created a convoluted routine that first strips out the embedded
blanks using the -itrim- function (which is missing from the [D]manual
section on string functions) Then I work in from either end using a series
of operations such as:

g v4 = word (v3,-1)                          
replace v1 = subinstr (v3, word (v3, -1), "",1)       

Is there a way to get these columns of data into variables in a .dta file
more easily, or is there a way to convert the embedded blanks to characters
or " " spaces so I could operate on them more easily?

I could do it by putting it through my text editor, but I ultimately need
the import to be automated in a .do file, so I need a procedure that avoids
having to manually deal with it.

Thanks for any insights

Buzz

Buzz Burhans, Ph.D.

Dairy-Tech Group
Phone: 802-755-6842
Cell: 802-388-7214 

Email: [email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index