Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: String variables over 244 in a dataset with two delimiters


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Re: String variables over 244 in a dataset with two delimiters
Date   Thu, 22 Sep 2011 19:53:05 +0100

There is a bug here.

   if (sep == "")

should be

   if (sep == " ")

On Thu, Sep 22, 2011 at 2:35 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> What's implicit, I hope, is that I am guessing is that the best
> strategy for Adam's specific problem is to separate out the long
> variable, in which case it can be parsed on semi-colons and merged
> back in somehow.
>
> I am not keen on trying to write a program for Adam's mix of tabs
> delimiting variables and semi-colons also being used within the
> longest string.
>
> Here is a nth field program. It selects the n th field from each line
> (record) of a text file and puts it elsewhere. Asking for a nth field
> that does not exist or a nth field being empty is not a problem; empty
> strings are returned in each case. I can't guarantee that this copes
> with all problems and would be pleased to hear of cleaner approaches.
>
> *! NJC 1.0.0 22 Sept 2011
> program nthfield
>        version 9
>        syntax anything(name=files) [, N(int 1) DELIMiter(str) ]
>
>        gettoken data files : files
>        gettoken field files : files
>        if "`data'" == "" | "`field'" == "" | "`files'" != "" {
>                di as err "syntax is: " ///
>                as txt "nthfield {it:datafile fieldfile}"
>                exit 198
>        }
>
>        confirm file "`data'"
>        confirm new file "`field'"
>
>        if "`delimiter'" == "" local sep = char(9)
>        else local sep "`delimiter'"
>
>        tempname in out
>        file open `in' using "`data'", r
>        file open `out' using "`field'", w
>        file read `in' line
>
>        while r(eof) == 0 {
>                mata : _nth("line", `n', "`sep'")
>                file write `out' `"`line'"' _n
>                file read `in' line
>        }
>        file close `out'
> end
>
> version 9
> mata :
>
> void _nth(string scalar macname, scalar n, string scalar sep) {
>        string rowvector fields
>        string scalar nth
>        scalar nf, nsep, j
>
>        fields = tokens(st_local(macname), sep)
>        nf = cols(fields)
>        nth = ""
>
>        if (sep == "") {
>                if (n <= nf) nth = fields[n]
>        }
>        else {
>                j = nsep = 0
>                while (nsep < (n - 1) & j < nf) {
>                        if (fields[++j] == sep) nsep++
>                }
>                if (j < nf) {
>                        if (fields[j + 1] != sep) nth = fields[j + 1]
>                }
>        }
>
>        st_local(macname, nth)
> }
>
> end
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index