Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Eric Booth <ebooth@ppri.tamu.edu> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Breaking one string variable into several new variables |

Date |
Wed, 24 Feb 2010 17:02:24 -0600 |

> It's easier to fix your problem by importing the data correctly--it appears that Stata doesn't understand your data structure. Your data are a .txt file, but how are the delimited? (It looks like a tab from your example) What command did you use to import them? You may want to try opening the file up in a spreadsheet program and saving them as a tab-delimited or a comma-delimited file so that you know how to properly specify your import command (e.g., insheet, infile, etc). Also, you could try converting the file to .dta or other filetypes using Stat Transfer. The point is that whatever import command you used did not tell Stata about the correct delimiter and so it placed all the observations into one column (v1). Your data structure looks consistent, so I doubt that one of the import commands just won't work for you, but if not, then try using the -split- command rather than the -substr- function. So with the first observation: ****** clear inp str90 var1 "oilseed farming 100 cotton farming 2000 .1" end split var1 li ***** ~ Eric __ Eric A. Booth Public Policy Research Institute Texas A&M University ebooth@ppri.tamu.edu Office: +979.845.6754 On Feb 24, 2010, at 4:22 PM, Anna Rakhman wrote: > Dear Statalist, > > I have the following issue I was hoping you could help with. I've imported > data from a .txt file and no matter how I import it, I always end up with > one variable while I really need 6 different variables. > > This is what my file now looks like now (this is the first 4 observations of > variable v1, the only variable in the dataset): > > industry1 industry1_def industry2 > industry2_def year value > 1 oilseed farming 100 > cotton farming 2000 .1 > 2 logging 200 > iron ore mining 2000 .2 > 3 blah and blah and blah 300 > yata, yata 2000 .3 > > This is a made-up example, but as you can see, the problem is that each > column should be a separate variable. > > I've tried using gen split1=(v1,1), gen split2=(v1,-1) and gen > split3=(v1,-2) to get industr1, value, and year as separate variables, but > I'm not sure how to get industry2 as a separate variable because it is not a > fixed number of words from either end of the string. > > Any suggestions? > > Thanks! > Anna > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Breaking one string variable into several new variables***From:*Anna Rakhman <amr0084@gmail.com>

- Prev by Date:
**RE: Re: st: RE: regression r(103): too many variables** - Next by Date:
**Re: st: Breaking one string variable into several new variables** - Previous by thread:
**st: Breaking one string variable into several new variables** - Next by thread:
**Re: st: Breaking one string variable into several new variables** - Index(es):