Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Breaking one string variable into several new variables

From   Eric Booth <>
Subject   Re: st: Breaking one string variable into several new variables
Date   Wed, 24 Feb 2010 17:02:24 -0600


It's easier to fix your problem by importing the data correctly--it appears that Stata doesn't understand your data structure.
 Your data are a .txt file, but how are the delimited? (It looks like a tab from your example)

What command did you use to import them?  You may want to try opening the file up in a spreadsheet program and saving them as a
tab-delimited or a comma-delimited file so that you know how to properly specify your import command (e.g., insheet, infile, etc).
Also, you could try converting the file to .dta or other filetypes using Stat Transfer.  The point is that whatever import command you
used did not tell Stata about the correct delimiter and so it placed all the observations into one column (v1).

Your data structure looks consistent, so I doubt that one of the import commands just won't work for you, but if not,
then try using the -split- command rather than the -substr- function.  So with the first observation:

inp str90 var1
"oilseed farming                 100             cotton farming          2000              .1"

split var1

~ Eric
Eric A. Booth
Public Policy Research Institute
Texas A&M University
Office: +979.845.6754

On Feb 24, 2010, at 4:22 PM, Anna Rakhman wrote:

> Dear Statalist,
> I have the following issue I was hoping you could help with.  I've imported
> data from a .txt file and no matter how I import it, I always end up with
> one variable while I really need 6 different variables.
> This is what my file now looks like now (this is the first 4 observations of
> variable v1, the only variable in the dataset):
> industry1                     industry1_def                   industry2
>          industry2_def            year              value
> 1                                oilseed farming                 100
>              cotton farming          2000              .1
> 2                                logging                             200
>                  iron ore mining         2000              .2
> 3                                blah and blah and blah       300
>           yata, yata                 2000              .3
> This is a made-up example, but as you can see, the problem is that each
> column should be a separate variable.
> I've tried using gen split1=(v1,1), gen split2=(v1,-1) and gen
> split3=(v1,-2) to get industr1, value, and year as separate variables, but
> I'm not sure how to get industry2 as a separate variable because it is not a
> fixed number of words from either end of the string.
> Any suggestions?
> Thanks!
> Anna
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index