Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

AW: st: Breaking one string variable into several new variables

From   "Martin Weiss" <>
To   <>
Subject   AW: st: Breaking one string variable into several new variables
Date   Thu, 25 Feb 2010 09:13:37 +0100


" split v1, g(new_) parse(`=char(9)') destring"

The -destring- option cannot have any effect at this point since the first row of the dataset is full of "var" strings. So it is probably better to -destring, replace- towards the end of the code.


-----Ursprüngliche Nachricht-----
Von: [] Im Auftrag von Tirthankar Chakravarty
Gesendet: Donnerstag, 25. Februar 2010 00:17
Betreff: Re: st: Breaking one string variable into several new variables

Although I think your problem could be much better solved by importing
carefully (see code below for hints as to how this might work), but in
case you are stuck with data of the kind of show, here is how you
might recover the original data. From the way your example data has
wrapped, I am guessing that you have tabs separating variables. If
not, please let me know:
input var1 str20 var2 var3 str20 var4 var5 var6
1 "a b" 100 "c d" 2000 .1
1 "a b" 100 "c d" 2000 .1
1 "a b" 100 "c d" 2000 .1
1 "a b" 100 "c d" 2000 .1
1 "a b" 100 "c d" 2000 .1
outsheet * using exampledata.txt, noquote replace
insheet  using exampledata.txt, comma nonames clear
li, clean
split v1, g(new_) parse(`=char(9)') destring

// rename from first row
foreach x of varlist new_* {
	local newname = `x' in 1
	rename `x' `newname'
drop in 1
drop v1
li, noobs


2010/2/25 Anna Rakhman <>:
> Dear Statalist,
> I have the following issue I was hoping you could help with.  I've imported
> data from a .txt file and no matter how I import it, I always end up with
> one variable while I really need 6 different variables.
> This is what my file now looks like now (this is the first 4 observations of
> variable v1, the only variable in the dataset):
> industry1                     industry1_def                   industry2
>          industry2_def            year              value
> 1                                oilseed farming                 100
>              cotton farming          2000              .1
> 2                                logging                             200
>                  iron ore mining         2000              .2
> 3                                blah and blah and blah       300
>           yata, yata                 2000              .3
> This is a made-up example, but as you can see, the problem is that each
> column should be a separate variable.
> I've tried using gen split1=(v1,1), gen split2=(v1,-1) and gen
> split3=(v1,-2) to get industr1, value, and year as separate variables, but
> I'm not sure how to get industry2 as a separate variable because it is not a
> fixed number of words from either end of the string.
> Any suggestions?
> Thanks!
> Anna
> *
> *   For searches and help try:
> *
> *
> *

To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index