Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: String variables over 244 in a dataset with two delimiters


From   "Ozimek, Adam" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: Re: String variables over 244 in a dataset with two delimiters
Date   Wed, 21 Sep 2011 20:56:01 -0400

How to correctly us the file command is not clear to me from the help file. Is there a longer online tutorial to be found? It is an unfortunately named command in that searching for "state file command" is not very helpful in google. 

There is more than one variable with a semi-colon in it, and so just replace all semi-colons with tabs will cause a bit of confusion, so I'm guessing I need to step the the file one line at a time and use the file command rather than filefilter.


________________________________________
From: [email protected] [[email protected]] On Behalf Of Austin Nichols [[email protected]]
Sent: Tuesday, September 20, 2011 10:21 AM
To: [email protected]
Subject: Re: st: Re: String variables over 244 in a dataset with two delimiters

Joseph Coveney <[email protected]>:
Good answer.  If some substrings delimited by semicolons are greater
than 244 characters in lengths, and you want to keep all information,
you can also use -file- to step through the file one line at a time
and save bits of longer strings as separate variables, e.g. in 100
character chunks.

On Mon, Sep 19, 2011 at 11:40 PM, Joseph Coveney <[email protected]> wrote:
> Adam Ozimek wrote:
>
> I have a dataset that is tab delimited, and one of the variables is a string
> that can be over 244 characters. If I read this using insheet, or inputst, or I
> think anything else, it truncates this variable. However, there is an aspect of
> the string variable that I hope will let me get around this: it is delimited by
> semicolon. Is there a way to select one of the columns in a tab delimited
> dataset, and read in by parsing it as semi-colon delimited? Is there some
> otherway to rescue the long variable without the truncation?
>
> --------------------------------------------------------------------------------
>
> There are a couple of ways to approach this problem, but I think that the most
> direct is to use Stata's -filefilter- command to convert semicolons to
> double-quote + tab + double-quotes, and then read the converted file in with
> -insheet-.  (To learn more about-filefilter-, see Stata's online help for the
> command or see its entry in the user manual.)
>
> Notes:
>
> 1. This assumes that your string column's contents are surrounded by
> double-quotation marks.  If not, then just convert the semicolons to tabs alone.
>
> 2. If your tab-delimited file has a header row (column names), then remember to
> insert a new name for your newly created column.  There are a couple of ways to
> do that, too, in Stata, but again -filefilter- might be the most direct.
>
> 3. Don't overwrite your original.  (I'm not sure that -filefilter- will even
> allow you to name <newfile> the same as <oldfile>, but if it does, don't do it.)
>
>
> 4. The converted file can be a temporary file by using -tempfile- in conjunction
> with -filefilter-.  This makes the project's intermediate-file-cleanup chores
> easier.
>
> Joseph Coveney

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index