Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reading in long string variables (yet again)


From   Steve Nakoneshny <scnakone@ucalgary.ca>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: reading in long string variables (yet again)
Date   Fri, 20 Apr 2012 10:08:59 -0600

Eric,

The initial datafile is tab-delimited and contains a mixture of both categorical and non-categorical numeric variables along with a healthy number of string variables (without double quotes). I have already written a fair bit of code to manipulate these data into a workable dataset. This one string variable only became problematic when we realized that for some observations, its length exceeded 244 characters.

Thanks to your suggestion of -intext-, I think I've found a solution that will work for me. I can probably operationalize my workflow better (not to mention the code I've written), but that's a separate concern. Here's my solution:

Starting with my exported tab-delimted text file, I used StatTransfer to create two new text files. One file contained only my unique id variable. The other contained only this sticky string variable. I thought that by doing it this way, the sort order of the source file would be maintained (an assumption I rely on later). Although I don't include the code here, I then merged the resulting file back into my source dataset so I can make use of the "chunk" I wanted to get in the first place.

--- begin code ---

intext using "ds.txt", gen(ds) length(21)
drop in 1	// intext place varnames in first obs. do not want.
gen n = _n
tempfile ds
save `ds'

insheet using "tumorid.txt", clear
gen n = _n
merge 1:1 n using `ds'
drop n _merge

reshape long ds, i(tumorid)
drop if ds == ""	// removes blank records
drop if ds == `";""'	// removes incomplete chunk fragments

bys tumorid: gen N = _N
keep if _j == N		// keeps only the chunk of interest

isid tumorid
drop _j N

--- end code ---

Steve
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index