Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: String variables over 244 in a dataset with two delimiters


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: String variables over 244 in a dataset with two delimiters
Date   Tue, 20 Sep 2011 10:21:52 -0400

Joseph Coveney <jcoveney@bigplanet.com>:
Good answer.  If some substrings delimited by semicolons are greater
than 244 characters in lengths, and you want to keep all information,
you can also use -file- to step through the file one line at a time
and save bits of longer strings as separate variables, e.g. in 100
character chunks.

On Mon, Sep 19, 2011 at 11:40 PM, Joseph Coveney <jcoveney@bigplanet.com> wrote:
> Adam Ozimek wrote:
>
> I have a dataset that is tab delimited, and one of the variables is a string
> that can be over 244 characters. If I read this using insheet, or inputst, or I
> think anything else, it truncates this variable. However, there is an aspect of
> the string variable that I hope will let me get around this: it is delimited by
> semicolon. Is there a way to select one of the columns in a tab delimited
> dataset, and read in by parsing it as semi-colon delimited? Is there some
> otherway to rescue the long variable without the truncation?
>
> --------------------------------------------------------------------------------
>
> There are a couple of ways to approach this problem, but I think that the most
> direct is to use Stata's -filefilter- command to convert semicolons to
> double-quote + tab + double-quotes, and then read the converted file in with
> -insheet-.  (To learn more about-filefilter-, see Stata's online help for the
> command or see its entry in the user manual.)
>
> Notes:
>
> 1. This assumes that your string column's contents are surrounded by
> double-quotation marks.  If not, then just convert the semicolons to tabs alone.
>
> 2. If your tab-delimited file has a header row (column names), then remember to
> insert a new name for your newly created column.  There are a couple of ways to
> do that, too, in Stata, but again -filefilter- might be the most direct.
>
> 3. Don't overwrite your original.  (I'm not sure that -filefilter- will even
> allow you to name <newfile> the same as <oldfile>, but if it does, don't do it.)
>
>
> 4. The converted file can be a temporary file by using -tempfile- in conjunction
> with -filefilter-.  This makes the project's intermediate-file-cleanup chores
> easier.
>
> Joseph Coveney

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index