[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: insheet delimiter problem

From   "Ada Ma" <>
Subject   Re: st: insheet delimiter problem
Date   Mon, 10 Nov 2008 12:26:43 +0000

On Mon, Nov 10, 2008 at 11:58 AM, Neil Shephard <> wrote:
> Ada Ma wrote:
>> Hi Statalist,
>> Is there a way to stop Stata seeing double quotes as delimiters?  I
>> have data files in txt format, the data is pipe (|) delimited, but the
>> people who generated the data also use double quotes (") to specify
>> missing variables so I got a large number of pipes with a couple of
>> double quotes I find in the txt files.
>> I can read the data in Stata fine - only if I open up the text files
>> and remove all the double quotes before I -insheet- the data with pipe
>> specified as the delimiter.  If would be nice if I don't have to check
>> for double quotes first because it would save me the time opening up
>> the data files twice - first for getting rid of double quotes and
>> another for reading it into Stata.
> Without seeing an example I don't understand the problem.  It sounds as
> though you are using the -delimiter("char")- option, e.g.
> insheet using [path/to/your/file/filename], delim("|") clear
> So its irrelevant what the people who generated the data used to sepcify
> the missing variable (which you indicate to be double quotes), the
> delimiter is "|" and is explicitly defined and anything between these
> delimiteres is considered by Stata to be a variable.
> This may result in some data that is intended to be numeric being read
> as string, but you can -destring- or otherwise convert afterwards.
> Neil
> --

Hi Neil,

Thanks for the reply.  Here is an example I have created which is
close to what happened.  The data should look like this:

epikey	hrg	        code1	code2	code3
1	        A0123	D100  	V123	        K166
2	        A0125	D200	        "	        G122
3	        B0101       D300    	"	        C333
4	        B0122	D400	        E002	        V777

It is pipe delimited so in the text file it looks like this:


When I specified the command as you stated above, i.e. specifying the
delim("|") option, Stata reads in this:

epikey	hrg	        code1	code2	                       code3
1	        A0123	D100  	V123	                               K166
2	        A0125	D200	        |G1223|B0101|D300|	       C333
4	        B0122	D400	        E002	                               V777

So everything between the double quotes are treated as one string.  Is
there any way to get around this without editing the txt file?

Thanks again!

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index