Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reading in long string variables (yet again)

From	Steve Nakoneshny <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: reading in long string variables (yet again)
Date	Fri, 20 Apr 2012 10:08:59 -0600

Eric,

The initial datafile is tab-delimited and contains a mixture of both categorical and non-categorical numeric variables along with a healthy number of string variables (without double quotes). I have already written a fair bit of code to manipulate these data into a workable dataset. This one string variable only became problematic when we realized that for some observations, its length exceeded 244 characters.

Thanks to your suggestion of -intext-, I think I've found a solution that will work for me. I can probably operationalize my workflow better (not to mention the code I've written), but that's a separate concern. Here's my solution:

Starting with my exported tab-delimted text file, I used StatTransfer to create two new text files. One file contained only my unique id variable. The other contained only this sticky string variable. I thought that by doing it this way, the sort order of the source file would be maintained (an assumption I rely on later). Although I don't include the code here, I then merged the resulting file back into my source dataset so I can make use of the "chunk" I wanted to get in the first place.

--- begin code ---

intext using "ds.txt", gen(ds) length(21)
drop in 1	// intext place varnames in first obs. do not want.
gen n = _n
tempfile ds
save `ds'

insheet using "tumorid.txt", clear
gen n = _n
merge 1:1 n using `ds'
drop n _merge

reshape long ds, i(tumorid)
drop if ds == ""	// removes blank records
drop if ds == `";""'	// removes incomplete chunk fragments

bys tumorid: gen N = _N
keep if _j == N		// keeps only the chunk of interest

isid tumorid
drop _j N

--- end code ---

Steve
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: reading in long string variables (yet again)
  - From: Steve Nakoneshny <[email protected]>
- Re: st: reading in long string variables (yet again)
  - From: Nick Cox <[email protected]>
- Re: st: reading in long string variables (yet again)
  - From: Eric Booth <[email protected]>

Prev by Date: AW: st: mi estimate after correlation commands
Next by Date: Re: st: control a variable in stata
Previous by thread: Re: st: reading in long string variables (yet again)
Next by thread: st: how to show "dots" in a loop?
Index(es):
- Date
- Thread