Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: problem in uploading data into Stata - data "changes"


From   Steven Samuels <samplerx@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: problem in uploading data into Stata - data "changes"
Date   Tue, 1 Jul 2008 14:30:10 -0400

I agree with Nick and would personally treat the identifiers as  
"string".  You can separate var1 and var2 using the -substr-  
function. Perhaps the failure of -infix- to read your data and the  
StatTransfer's conversion of the "first column" to string are both  
caused by the presence of an alpha or hidden character.


-Steve
On Jul 1, 2008, at 2:00 PM, Nick Cox wrote:

> You would be much better off reading in your identifiers either as
> string variables or as doubles. Stata can't hold 14 digit variables
> exactly in floats. This is documented in several places: -search
> precision- for some.
>
> If you input the identifiers as string, I can see no reason why you
> should also want to -encode- them.
>
> -format-ting after input will never put back precision that was  
> lost on
> input. That is shutting the stable door after the horse has bolted.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Gisella Young
>
> Thank you for the replies. I have been unable to resolve the  
> problem, so
> am copying more details below as requested.
>
> The data in the original text dataset looks as follows
> 1010100100050101112101 var3 var 4...
> 1010100100050101112102  var3 var 4...
> 1010100100050101112104  var3 var 4...
> 1010100100050101112303  var3 var 4...
> 1010100100050101113101  var3 var 4...
>
> The number in the first column is actually the first 2 variables, var1
> is 14 digits and var2 is 8 digits. In the text dataset there is no  
> space
> between them. Actually neither var1 nor var2 are supposed to be  
> unique,
> but the combination of them is (and is in the original data).  
> (Although
> they do need to be analysed separately - var1 is the person identifier
> and var2 is the activity).
>
> I am now using stat transfer to convert the file (specifying the  
> option
> ASCII - Delimited). When I look at the data in the "view" option in  
> stat
> transfer it looks fine. One relevant point might be that in the
> 'variables' window of stat transfer, the first variable (which is
> actually var1 and var2 which it is treating as one) is listed as  
> string
> while the others are floats.
>
> The good news is that I can now make the transfer and the col1  
> variable
> that comes up in Stata (of 22 digits, combining var1 and var2) is
> unique. One problem however is that when I try to encode this variable
> 'col1', it does not work as I get error message 134 (that I have tried
> to encode too many values). There are just under 1.5 million
> observations.
>
> I then tried specifying 'col1' in stat transfer as either a float or
> long variable, but neither or these work - with long all the variables
> come up in Stata as 0, and with float they are no longer unique (no
> matter how many digits I allow for when formatting the variable).
>
> I guess one option would be to convert them using Stattransfer in the
> original string format, and then find a way of encoding the variables
> (despite the problem of too many observations) and then somehow
> splitting the 'col1' variable into the 2 variables var1 (first 14
> digits) and var2 (next 8 digits).
>
>
> When I try using infix, my command is:
> ..infix var1 1-14 var2 15-22 using "filename"
>
> I then format the variables to give them enough places (format %16.0g
> var1 var2). When I sort by var1 var2, my first 3 observations are as
> follows - clearly the combination of var1 and var2 is not unique:
>
> var1	var2
> 10101000765440	1111101
> 10101000765440	1111101
> 10101000765440	1111101
>
>
> Any suggestions would be highly appreciated.
>
> regards,
> Gisella
>
>
> --- On Tue, 7/1/08, Steven Samuels <sjhsamuels@earthlink.net> wrote:
>
>> From: Steven Samuels <sjhsamuels@earthlink.net>
>> Subject: Re: st: problem in uploading data into Stata - data  
>> "changes"
>> To: statalist@hsphsun2.harvard.edu
>> Date: Tuesday, July 1, 2008, 3:18 PM
>> Gisella,
>>
>> Show us an example of a data line and your -infix-
>> statements  Also,
>> what are the item separators in your text file (commas,
>> tabs,..) ?
>> If Excel can figure out the variable columns, then
>> StatTransfer can
>> also (see ASCII input options); there is no need to go
>> through Excel.
>>
>> -Steve
>> On Jul 1, 2008, at 11:05 AM, Gisella Young wrote:
>>
>>> Dear all,
>>>
>>> I am trying to load a datafile in text format into
>> Stata. I am
>>> using the infix command. The problem is that 1 column
>> of data (the
>>> firm column, which is the unique identification number
>> for each
>>> observation, is different when I open it in Stata as
>> from what I
>>> can see in the original text file. In fact I have
>> several such text
>>> files for various years, and in every case the problem
>> is the same:
>>> all variables upload correctly except for the first
>> one. Not only
>>> is that number different but it is no longer unique to
>> each
>>> observation. It is however the same number of digits
>> as the
>>> original. I have checked that the infix command is
>> specified
>>> correctly (eg correct number of digits).
>>>
>>> I have also tried saving the text file into excel (and
>> applying
>>> text-to-columns) and then converting it into a stata
>> file using
>>> Stat-transfer. When I do this all the variable upload
>> correctly
>>> into Stata. The problem is that I cannot do this for
>> the entire
>>> files because of their size (the limits of Excel mean
>> that only a
>>> small fraction of each file can be accommodated), so
>> this is not a
>>> solution.
>>>
>>> I realise that it may be difficult for someone to
>> suggest an
>>> explanation/solution without seeing the actual data,
>> but I wonder
>>> whether there are any suggestions as to what the
>> problem might
>>> potentially be, and how to get around it?
>>>
>>> Many thanks,
>>> Gisella
>>>
>>>
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/support/faqs/res/findit.html
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/support/faqs/res/findit.html
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

Steven Samuels
845-246-0774
18 Cantine's Island
Saugerties, NY 12477
EFax: 208-498-7441





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index