Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reading in long string variables (yet again)


From   Eric Booth <eric.a.booth@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: reading in long string variables (yet again)
Date   Wed, 18 Apr 2012 17:31:13 -0500

<>

<>

You might also consider -intext- from SSC which will read those long strings into Stata, but then you'd have to work out some code to combine them back together again.  Also, I'm not entirely sure how your data looks, but you might be able to use -filefilter- to change those delimiters in a way that Stata would read those chunks within the long strings as separate variables.  
It's hard to tell how much either of these would help you without seeing a raw data snippet (so, are the 20 character chunks in the long strings delimited by ; AND with double quotes around them?  are all other variables in the raw data delimited by double quotes AND ; ?), as Nick has mentioned.  Depending on your data, it may be that neither of these 2 solutions are useful - but if they are, then then they are probably easier than learning mata.  
P.S. A tool that might come in handy during the manipulation of long strings, regardless of which strategy you ultimately choose to use, is -lstrfun- from SSC by Dan Blanchette.  

- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754


On Apr 18, 2012, at 1:45 PM, Nick Cox wrote:

> This kind of question arises quite often and not surprisingly people
> don't like the answers. These are my guesses at what the answers are.
> 
> 1. -str244- is at big as you can get with Stata string variables, as
> of today. Only StataCorp can speak about what might happen in the
> future, and on past form a big change like this would be announced as
> it happened.
> 
> 2. Presented with more characters, Stata truncates on the right. This
> is not tunable. There is no work-around that splits too-long variables
> into acceptable length variables on the fly.
> 
> 3. Your options as far as I know are all Mata-related or involve
> something other than Stata in which you select the part of the string
> you want _or_ chunk the string into smaller separated strings.
> 
> 4. This is programmable with Stata and Mata, and programmable with a
> program of the order of 100 lines long. It would be a tough initiation
> to Mata.
> 
> I posted programs for jobs of similar difficulty in a thread starting
> at http://www.stata.com/statalist/archive/2011-09/msg00973.html but I
> doubt that any of them is easily adaptable for your problem.
> 
> I think anyone taking this on would need more detail on your data.
> Your description is clear, but a programmer would need to know which
> fields were which.
> 
> The FAQ counsels against assuming that readers are in the same or
> similar time zone!
> 
> Nick
> 
> On Wed, Apr 18, 2012 at 6:57 PM, Steve Nakoneshny <scnakone@ucalgary.ca> wrote:
>> Good morning Statalisters,
>> 
>> Our database stores sequential followup data on patients in 20 character chunks in the format "MM/DD/YYYY {1} [1:1]" and each chunk is parsed by ";". These data are stored in a single string variable in the table, with the most recent update being appended to the right hand side. This causes a problem when importing to Stata given the 244 character limit on string variables as we are only interested in using the right-most chunk.
>> 
>> Is it possible to have Stata read in this string variable from the right hand side only? Or failing that, read in an ID variable (or two) as is and read in the string variable by parsing on ";"? I attempted to do this using -insheet- with the -delim(;)- option, but that applies to the whole dataset and not just a single variable.
>> 
>> From reading through the archive, it seems that I might be able to use Mata as a workaround, but I have no experience to do so. Any thoughts or comments are appreciated.
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index