Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: reading in long string variables (yet again)
From 
 
Steve Nakoneshny <[email protected]> 
To 
 
"[email protected]" <[email protected]> 
Subject 
 
Re: st: reading in long string variables (yet again) 
Date 
 
Fri, 20 Apr 2012 10:08:59 -0600 
Eric,
The initial datafile is tab-delimited and contains a mixture of both categorical and non-categorical numeric variables along with a healthy number of string variables (without double quotes). I have already written a fair bit of code to manipulate these data into a workable dataset. This one string variable only became problematic when we realized that for some observations, its length exceeded 244 characters.
Thanks to your suggestion of -intext-, I think I've found a solution that will work for me. I can probably operationalize my workflow better (not to mention the code I've written), but that's a separate concern. Here's my solution:
Starting with my exported tab-delimted text file, I used StatTransfer to create two new text files. One file contained only my unique id variable. The other contained only this sticky string variable. I thought that by doing it this way, the sort order of the source file would be maintained (an assumption I rely on later). Although I don't include the code here, I then merged the resulting file back into my source dataset so I can make use of the "chunk" I wanted to get in the first place.
--- begin code ---
intext using "ds.txt", gen(ds) length(21)
drop in 1	// intext place varnames in first obs. do not want.
gen n = _n
tempfile ds
save `ds'
insheet using "tumorid.txt", clear
gen n = _n
merge 1:1 n using `ds'
drop n _merge
reshape long ds, i(tumorid)
drop if ds == ""	// removes blank records
drop if ds == `";""'	// removes incomplete chunk fragments
bys tumorid: gen N = _N
keep if _j == N		// keeps only the chunk of interest
isid tumorid
drop _j N
--- end code ---
Steve
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/