Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: reading in long string variables (yet again)


From   Nick Cox <[email protected]>
To   [email protected]
Subject   Re: st: reading in long string variables (yet again)
Date   Wed, 18 Apr 2012 19:45:37 +0100

This kind of question arises quite often and not surprisingly people
don't like the answers. These are my guesses at what the answers are.

1. -str244- is at big as you can get with Stata string variables, as
of today. Only StataCorp can speak about what might happen in the
future, and on past form a big change like this would be announced as
it happened.

2. Presented with more characters, Stata truncates on the right. This
is not tunable. There is no work-around that splits too-long variables
into acceptable length variables on the fly.

3. Your options as far as I know are all Mata-related or involve
something other than Stata in which you select the part of the string
you want _or_ chunk the string into smaller separated strings.

4. This is programmable with Stata and Mata, and programmable with a
program of the order of 100 lines long. It would be a tough initiation
to Mata.

I posted programs for jobs of similar difficulty in a thread starting
at http://www.stata.com/statalist/archive/2011-09/msg00973.html but I
doubt that any of them is easily adaptable for your problem.

I think anyone taking this on would need more detail on your data.
Your description is clear, but a programmer would need to know which
fields were which.

The FAQ counsels against assuming that readers are in the same or
similar time zone!

Nick

On Wed, Apr 18, 2012 at 6:57 PM, Steve Nakoneshny <[email protected]> wrote:
> Good morning Statalisters,
>
> Our database stores sequential followup data on patients in 20 character chunks in the format "MM/DD/YYYY {1} [1:1]" and each chunk is parsed by ";". These data are stored in a single string variable in the table, with the most recent update being appended to the right hand side. This causes a problem when importing to Stata given the 244 character limit on string variables as we are only interested in using the right-most chunk.
>
> Is it possible to have Stata read in this string variable from the right hand side only? Or failing that, read in an ID variable (or two) as is and read in the string variable by parsing on ";"? I attempted to do this using -insheet- with the -delim(;)- option, but that applies to the whole dataset and not just a single variable.
>
> From reading through the archive, it seems that I might be able to use Mata as a workaround, but I have no experience to do so. Any thoughts or comments are appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index