Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Creating a variable from comments/header in a .txt file


From   "Juan Solon" <Juan.Solon@lshtm.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Creating a variable from comments/header in a .txt file
Date   Thu, 14 Apr 2005 03:12:35 +0100

Hello, 

I would like to advise on using a data dictionary to extract comments
from a .txt file and putting this into a variable.

I wanted to read data from a  series of text files, with each file
containing unique file identifiers/descriptors in  a header in lines
1-10.  All files are named data.txt and I would have to do this
repeatedly and thus, Im looking for a solution that replicable across
files. The data itself is in a tabular format and starts on line 11 and
would look like this:
Spot counts:
      1    2    3    4    5    6    7    8    9   10   11   12
A     -    -    -    -    -    -    -    -    -    -    -    - 
B     -    -    -    -    -    -    -    -    -    -    -    - 
C     -    -    -    -    -    -    -    -    -    -    -    - 
D     -    -    -    -    -    -    -    -    -    -    -    - 
E     -    -    -    -    -    -    -    -    -    -  197    9 
F     -    -    -    -    -    -    -    -    -    -  188    7 
G     -    -    -    -    -    -    3    2    1  204  189   78 
H     -    -    -    -    -    -    1    2    0  254  195   63 

 I have made a do file that can read the tabular data beginning in Line
11 contained in data.txt using infile and a data dictionary.  My problem
is that I want to tag each observation with a unique identifier that
would be taken from line 1.  This is important because I would
eventually merge all the data from the text files and I would need to
know the source.

The solution I tried seems very clumsy, and it doesn't actually work as
I wanted it to.  I created a separate file with a variable (pl)
containing the unique file identifier (the string from Line 1).  I then 
generated an observation no using generate obsno=_n and saved the file
with the filename as the string from line 1.  This file can then be
merged with the tabular data .   


- start of do file - 
set more off
infile using eli2.dct, clear
drop in 2/l
gen pl=substr(plate,18,6) /* this is data from the first line for
example a string "10u5uc"*/
gen obsno=_n
local file =pl
sort obsno
save `file', replace
infile using eli.dct, clear
gen obsno=_n
sort obsno
merge obsno using 10u5uc, keep(pl) /* How do I do this using  a macro
instead of typing "10u5uc"

 - end of do file


The resulting merged file then contains 8 observations and a string
variable pl.  As expected, variable pl is missing for records 2-8.  I
would like variable pl for records 2-8 to have the same value as record
1.  I can do this manually:

replace pl ="10u5uc" if _merge==1
 
but would rather that it were a macro. At this point, I have hit a brick
wall!!!

1.  How do I extract the data in Line 1 into a variable  and repeat this
for all the records ?
2.  Is it possible to merge files and refer to the using dataset using a
macro? 

I look forward to your suggestions!
Juan





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index