[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: problem in uploading data into Stata - data "changes" |

Date |
Tue, 1 Jul 2008 19:00:39 +0100 |

You would be much better off reading in your identifiers either as string variables or as doubles. Stata can't hold 14 digit variables exactly in floats. This is documented in several places: -search precision- for some. If you input the identifiers as string, I can see no reason why you should also want to -encode- them. -format-ting after input will never put back precision that was lost on input. That is shutting the stable door after the horse has bolted. Nick n.j.cox@durham.ac.uk Gisella Young Thank you for the replies. I have been unable to resolve the problem, so am copying more details below as requested. The data in the original text dataset looks as follows 1010100100050101112101 var3 var 4... 1010100100050101112102 var3 var 4... 1010100100050101112104 var3 var 4... 1010100100050101112303 var3 var 4... 1010100100050101113101 var3 var 4... The number in the first column is actually the first 2 variables, var1 is 14 digits and var2 is 8 digits. In the text dataset there is no space between them. Actually neither var1 nor var2 are supposed to be unique, but the combination of them is (and is in the original data). (Although they do need to be analysed separately - var1 is the person identifier and var2 is the activity). I am now using stat transfer to convert the file (specifying the option ASCII - Delimited). When I look at the data in the "view" option in stat transfer it looks fine. One relevant point might be that in the 'variables' window of stat transfer, the first variable (which is actually var1 and var2 which it is treating as one) is listed as string while the others are floats. The good news is that I can now make the transfer and the col1 variable that comes up in Stata (of 22 digits, combining var1 and var2) is unique. One problem however is that when I try to encode this variable 'col1', it does not work as I get error message 134 (that I have tried to encode too many values). There are just under 1.5 million observations. I then tried specifying 'col1' in stat transfer as either a float or long variable, but neither or these work - with long all the variables come up in Stata as 0, and with float they are no longer unique (no matter how many digits I allow for when formatting the variable). I guess one option would be to convert them using Stattransfer in the original string format, and then find a way of encoding the variables (despite the problem of too many observations) and then somehow splitting the 'col1' variable into the 2 variables var1 (first 14 digits) and var2 (next 8 digits). When I try using infix, my command is: ..infix var1 1-14 var2 15-22 using "filename" I then format the variables to give them enough places (format %16.0g var1 var2). When I sort by var1 var2, my first 3 observations are as follows - clearly the combination of var1 and var2 is not unique: var1 var2 10101000765440 1111101 10101000765440 1111101 10101000765440 1111101 Any suggestions would be highly appreciated. regards, Gisella --- On Tue, 7/1/08, Steven Samuels <sjhsamuels@earthlink.net> wrote: > From: Steven Samuels <sjhsamuels@earthlink.net> > Subject: Re: st: problem in uploading data into Stata - data "changes" > To: statalist@hsphsun2.harvard.edu > Date: Tuesday, July 1, 2008, 3:18 PM > Gisella, > > Show us an example of a data line and your -infix- > statements Also, > what are the item separators in your text file (commas, > tabs,..) ? > If Excel can figure out the variable columns, then > StatTransfer can > also (see ASCII input options); there is no need to go > through Excel. > > -Steve > On Jul 1, 2008, at 11:05 AM, Gisella Young wrote: > > > Dear all, > > > > I am trying to load a datafile in text format into > Stata. I am > > using the infix command. The problem is that 1 column > of data (the > > firm column, which is the unique identification number > for each > > observation, is different when I open it in Stata as > from what I > > can see in the original text file. In fact I have > several such text > > files for various years, and in every case the problem > is the same: > > all variables upload correctly except for the first > one. Not only > > is that number different but it is no longer unique to > each > > observation. It is however the same number of digits > as the > > original. I have checked that the infix command is > specified > > correctly (eg correct number of digits). > > > > I have also tried saving the text file into excel (and > applying > > text-to-columns) and then converting it into a stata > file using > > Stat-transfer. When I do this all the variable upload > correctly > > into Stata. The problem is that I cannot do this for > the entire > > files because of their size (the limits of Excel mean > that only a > > small fraction of each file can be accommodated), so > this is not a > > solution. > > > > I realise that it may be difficult for someone to > suggest an > > explanation/solution without seeing the actual data, > but I wonder > > whether there are any suggestions as to what the > problem might > > potentially be, and how to get around it? > > > > Many thanks, > > Gisella > > > > > > > > > > * > > * For searches and help try: > > * http://www.stata.com/support/faqs/res/findit.html > > * http://www.stata.com/support/statalist/faq > > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: problem in uploading data into Stata - data "changes"***From:*Steven Samuels <sjhsamuels@earthlink.net>

**Re: st: problem in uploading data into Stata - data "changes"***From:*Gisella Young <gisellayoung@yahoo.com>

- Prev by Date:
**RE: st: How to test autocorrelation in the disturbance in a system of equations?** - Next by Date:
**st: Search an error** - Previous by thread:
**Re: st: problem in uploading data into Stata - data "changes"** - Next by thread:
**Re: st: problem in uploading data into Stata - data "changes"** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |