Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: insheet delimiter problem


From   "Ada Ma" <heu034@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: insheet delimiter problem
Date   Mon, 10 Nov 2008 12:26:43 +0000

On Mon, Nov 10, 2008 at 11:58 AM, Neil Shephard <nshephard@nhs.net> wrote:
> Ada Ma wrote:
>> Hi Statalist,
>>
>> Is there a way to stop Stata seeing double quotes as delimiters?  I
>> have data files in txt format, the data is pipe (|) delimited, but the
>> people who generated the data also use double quotes (") to specify
>> missing variables so I got a large number of pipes with a couple of
>> double quotes I find in the txt files.
>>
>> I can read the data in Stata fine - only if I open up the text files
>> and remove all the double quotes before I -insheet- the data with pipe
>> specified as the delimiter.  If would be nice if I don't have to check
>> for double quotes first because it would save me the time opening up
>> the data files twice - first for getting rid of double quotes and
>> another for reading it into Stata.
>>
> Without seeing an example I don't understand the problem.  It sounds as
> though you are using the -delimiter("char")- option, e.g.
>
> insheet using [path/to/your/file/filename], delim("|") clear
>
> So its irrelevant what the people who generated the data used to sepcify
> the missing variable (which you indicate to be double quotes), the
> delimiter is "|" and is explicitly defined and anything between these
> delimiteres is considered by Stata to be a variable.
>
> This may result in some data that is intended to be numeric being read
> as string, but you can -destring- or otherwise convert afterwards.
>
> Neil
>
> --

Hi Neil,

Thanks for the reply.  Here is an example I have created which is
close to what happened.  The data should look like this:

epikey	hrg	        code1	code2	code3
1	        A0123	D100  	V123	        K166
2	        A0125	D200	        "	        G122
3	        B0101       D300    	"	        C333
4	        B0122	D400	        E002	        V777

It is pipe delimited so in the text file it looks like this:

epikey|hrg|code1|code2|code3
1|A0123|D100|V123|K166
2|A0125|D200|"|G122
3|B0101|D300|"|C333
4|B0122|D400|E002|V777

When I specified the command as you stated above, i.e. specifying the
delim("|") option, Stata reads in this:

epikey	hrg	        code1	code2	                       code3
1	        A0123	D100  	V123	                               K166
2	        A0125	D200	        |G1223|B0101|D300|	       C333
4	        B0122	D400	        E002	                               V777

So everything between the double quotes are treated as one string.  Is
there any way to get around this without editing the txt file?

Thanks again!

Ada
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index