Thank you very much Nick for your further comment. I will go with that approach, and then split the string variable into 2 (as it is actually 2 separate variables to begin with). In case this is useful to anyone else (unlikely I know as this is probably dead obvious to other people), I split the string variable 'col1' into the 2 original variables as follows: ..gen str var1 = substr(col1,-8,8) ..gen str var2 = substr(col1,-22,14) And since I do need the second one numerically, I then said ..destring var2, generate(var3) best, Gisella --- On Tue, 7/1/08, Gisella Young <gisellayoung@yahoo.com> wrote: > From: Gisella Young <gisellayoung@yahoo.com> > Subject: Re: st: problem in uploading data into Stata - data "changes" > To: statalist@hsphsun2.harvard.edu > Date: Tuesday, July 1, 2008, 5:47 PM > Thank you for the replies. I have been unable to resolve the > problem, so am copying more details below as requested. > > The data in the original text dataset looks as follows > 1010100100050101112101 var3 var 4... > 1010100100050101112102 var3 var 4... > 1010100100050101112104 var3 var 4... > 1010100100050101112303 var3 var 4... > 1010100100050101113101 var3 var 4... > > The number in the first column is actually the first 2 > variables, var1 is 14 digits and var2 is 8 digits. In the > text dataset there is no space between them. Actually > neither var1 nor var2 are supposed to be unique, but the > combination of them is (and is in the original data). > (Although they do need to be analysed separately - var1 is > the person identifier and var2 is the activity). > > I am now using stat transfer to convert the file > (specifying the option ASCII - Delimited). When I look at > the data in the "view" option in stat transfer it > looks fine. One relevant point might be that in the > 'variables' window of stat transfer, the first > variable (which is actually var1 and var2 which it is > treating as one) is listed as string while the others are > floats. > > The good news is that I can now make the transfer and the > col1 variable that comes up in Stata (of 22 digits, > combining var1 and var2) is unique. One problem however is > that when I try to encode this variable 'col1', it > does not work as I get error message 134 (that I have tried > to encode too many values). There are just under 1.5 million > observations. > > I then tried specifying 'col1' in stat transfer as > either a float or long variable, but neither or these work > - with long all the variables come up in Stata as 0, and > with float they are no longer unique (no matter how many > digits I allow for when formatting the variable). > > I guess one option would be to convert them using > Stattransfer in the original string format, and then find a > way of encoding the variables (despite the problem of too > many observations) and then somehow splitting the > 'col1' variable into the 2 variables var1 (first 14 > digits) and var2 (next 8 digits). > > > When I try using infix, my command is: > ..infix var1 1-14 var2 15-22 using "filename" > > I then format the variables to give them enough places > (format %16.0g var1 var2). When I sort by var1 var2, my > first 3 observations are as follows - clearly the > combination of var1 and var2 is not unique: > > var1 var2 > 10101000765440 1111101 > 10101000765440 1111101 > 10101000765440 1111101 > > > Any suggestions would be highly appreciated. > > regards, > Gisella > > > --- On Tue, 7/1/08, Steven Samuels > <sjhsamuels@earthlink.net> wrote: > > > From: Steven Samuels <sjhsamuels@earthlink.net> > > Subject: Re: st: problem in uploading data into Stata > - data "changes" > > To: statalist@hsphsun2.harvard.edu > > Date: Tuesday, July 1, 2008, 3:18 PM > > Gisella, > > > > Show us an example of a data line and your -infix- > > statements Also, > > what are the item separators in your text file > (commas, > > tabs,..) ? > > If Excel can figure out the variable columns, then > > StatTransfer can > > also (see ASCII input options); there is no need to go > > through Excel. > > > > -Steve > > On Jul 1, 2008, at 11:05 AM, Gisella Young wrote: > > > > > Dear all, > > > > > > I am trying to load a datafile in text format > into > > Stata. I am > > > using the infix command. The problem is that 1 > column > > of data (the > > > firm column, which is the unique identification > number > > for each > > > observation, is different when I open it in Stata > as > > from what I > > > can see in the original text file. In fact I have > > several such text > > > files for various years, and in every case the > problem > > is the same: > > > all variables upload correctly except for the > first > > one. Not only > > > is that number different but it is no longer > unique to > > each > > > observation. It is however the same number of > digits > > as the > > > original. I have checked that the infix command > is > > specified > > > correctly (eg correct number of digits). > > > > > > I have also tried saving the text file into excel > (and > > applying > > > text-to-columns) and then converting it into a > stata > > file using > > > Stat-transfer. When I do this all the variable > upload > > correctly > > > into Stata. The problem is that I cannot do this > for > > the entire > > > files because of their size (the limits of Excel > mean > > that only a > > > small fraction of each file can be accommodated), > so > > this is not a > > > solution. > > > > > > I realise that it may be difficult for someone to > > suggest an > > > explanation/solution without seeing the actual > data, > > but I wonder > > > whether there are any suggestions as to what the > > problem might > > > potentially be, and how to get around it? > > > > > > Many thanks, > > > Gisella > > > > > > > > > > > > > > > * > > > * For searches and help try: > > > * > http://www.stata.com/support/faqs/res/findit.html > > > * http://www.stata.com/support/statalist/faq > > > * http://www.ats.ucla.edu/stat/stata/ > > > > * > > * For searches and help try: > > * http://www.stata.com/support/faqs/res/findit.html > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

