[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Steven Samuels <samplerx@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: problem in uploading data into Stata - data "changes" |

Date |
Tue, 1 Jul 2008 14:30:10 -0400 |

I agree with Nick and would personally treat the identifiers as "string". You can separate var1 and var2 using the -substr- function. Perhaps the failure of -infix- to read your data and the StatTransfer's conversion of the "first column" to string are both caused by the presence of an alpha or hidden character. -Steve On Jul 1, 2008, at 2:00 PM, Nick Cox wrote: > You would be much better off reading in your identifiers either as > string variables or as doubles. Stata can't hold 14 digit variables > exactly in floats. This is documented in several places: -search > precision- for some. > > If you input the identifiers as string, I can see no reason why you > should also want to -encode- them. > > -format-ting after input will never put back precision that was > lost on > input. That is shutting the stable door after the horse has bolted. > > Nick > n.j.cox@durham.ac.uk > > Gisella Young > > Thank you for the replies. I have been unable to resolve the > problem, so > am copying more details below as requested. > > The data in the original text dataset looks as follows > 1010100100050101112101 var3 var 4... > 1010100100050101112102 var3 var 4... > 1010100100050101112104 var3 var 4... > 1010100100050101112303 var3 var 4... > 1010100100050101113101 var3 var 4... > > The number in the first column is actually the first 2 variables, var1 > is 14 digits and var2 is 8 digits. In the text dataset there is no > space > between them. Actually neither var1 nor var2 are supposed to be > unique, > but the combination of them is (and is in the original data). > (Although > they do need to be analysed separately - var1 is the person identifier > and var2 is the activity). > > I am now using stat transfer to convert the file (specifying the > option > ASCII - Delimited). When I look at the data in the "view" option in > stat > transfer it looks fine. One relevant point might be that in the > 'variables' window of stat transfer, the first variable (which is > actually var1 and var2 which it is treating as one) is listed as > string > while the others are floats. > > The good news is that I can now make the transfer and the col1 > variable > that comes up in Stata (of 22 digits, combining var1 and var2) is > unique. One problem however is that when I try to encode this variable > 'col1', it does not work as I get error message 134 (that I have tried > to encode too many values). There are just under 1.5 million > observations. > > I then tried specifying 'col1' in stat transfer as either a float or > long variable, but neither or these work - with long all the variables > come up in Stata as 0, and with float they are no longer unique (no > matter how many digits I allow for when formatting the variable). > > I guess one option would be to convert them using Stattransfer in the > original string format, and then find a way of encoding the variables > (despite the problem of too many observations) and then somehow > splitting the 'col1' variable into the 2 variables var1 (first 14 > digits) and var2 (next 8 digits). > > > When I try using infix, my command is: > ..infix var1 1-14 var2 15-22 using "filename" > > I then format the variables to give them enough places (format %16.0g > var1 var2). When I sort by var1 var2, my first 3 observations are as > follows - clearly the combination of var1 and var2 is not unique: > > var1 var2 > 10101000765440 1111101 > 10101000765440 1111101 > 10101000765440 1111101 > > > Any suggestions would be highly appreciated. > > regards, > Gisella > > > --- On Tue, 7/1/08, Steven Samuels <sjhsamuels@earthlink.net> wrote: > >> From: Steven Samuels <sjhsamuels@earthlink.net> >> Subject: Re: st: problem in uploading data into Stata - data >> "changes" >> To: statalist@hsphsun2.harvard.edu >> Date: Tuesday, July 1, 2008, 3:18 PM >> Gisella, >> >> Show us an example of a data line and your -infix- >> statements Also, >> what are the item separators in your text file (commas, >> tabs,..) ? >> If Excel can figure out the variable columns, then >> StatTransfer can >> also (see ASCII input options); there is no need to go >> through Excel. >> >> -Steve >> On Jul 1, 2008, at 11:05 AM, Gisella Young wrote: >> >>> Dear all, >>> >>> I am trying to load a datafile in text format into >> Stata. I am >>> using the infix command. The problem is that 1 column >> of data (the >>> firm column, which is the unique identification number >> for each >>> observation, is different when I open it in Stata as >> from what I >>> can see in the original text file. In fact I have >> several such text >>> files for various years, and in every case the problem >> is the same: >>> all variables upload correctly except for the first >> one. Not only >>> is that number different but it is no longer unique to >> each >>> observation. It is however the same number of digits >> as the >>> original. I have checked that the infix command is >> specified >>> correctly (eg correct number of digits). >>> >>> I have also tried saving the text file into excel (and >> applying >>> text-to-columns) and then converting it into a stata >> file using >>> Stat-transfer. When I do this all the variable upload >> correctly >>> into Stata. The problem is that I cannot do this for >> the entire >>> files because of their size (the limits of Excel mean >> that only a >>> small fraction of each file can be accommodated), so >> this is not a >>> solution. >>> >>> I realise that it may be difficult for someone to >> suggest an >>> explanation/solution without seeing the actual data, >> but I wonder >>> whether there are any suggestions as to what the >> problem might >>> potentially be, and how to get around it? >>> >>> Many thanks, >>> Gisella >>> >>> >>> >>> >>> * >>> * For searches and help try: >>> * http://www.stata.com/support/faqs/res/findit.html >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/support/faqs/res/findit.html >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > > > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/support/faqs/res/findit.html > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ Steven Samuels 845-246-0774 18 Cantine's Island Saugerties, NY 12477 EFax: 208-498-7441 * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: inequality measures, and dynamic decomposition of inequality** - Next by Date:
**st: SSC Archive activity, June 2008** - Previous by thread:
**Re: st: problem in uploading data into Stata - data "changes"** - Next by thread:
**st: How to test autocorrelation in the disturbance in a system ofequations?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |