Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: problem in uploading data into Stata - data "changes"


From   Gisella Young <gisellayoung@yahoo.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: problem in uploading data into Stata - data "changes"
Date   Tue, 1 Jul 2008 12:19:44 -0700 (PDT)

Thank you very much Nick for your further comment. 

I will go with that approach, and then split the string variable into 2 (as it is actually 2 separate variables to begin with). In case this is useful to anyone else (unlikely I know as this is probably dead obvious to other people), I split the string variable 'col1' into the 2 original variables as follows:
..gen str var1 = substr(col1,-8,8)
..gen str var2 = substr(col1,-22,14)

And since I do need the second one numerically, I then said
..destring var2, generate(var3)

best,
Gisella

--- On Tue, 7/1/08, Gisella Young <gisellayoung@yahoo.com> wrote:

> From: Gisella Young <gisellayoung@yahoo.com>
> Subject: Re: st: problem in uploading data into Stata - data "changes"
> To: statalist@hsphsun2.harvard.edu
> Date: Tuesday, July 1, 2008, 5:47 PM
> Thank you for the replies. I have been unable to resolve the
> problem, so am copying more details below as requested.
> 
> The data in the original text dataset looks as follows
> 1010100100050101112101 var3 var 4...        
> 1010100100050101112102  var3 var 4...        
> 1010100100050101112104  var3 var 4... 
> 1010100100050101112303  var3 var 4... 
> 1010100100050101113101  var3 var 4... 
> 
> The number in the first column is actually the first 2
> variables, var1 is 14 digits and var2 is 8 digits. In the
> text dataset there is no space between them. Actually
> neither var1 nor var2 are supposed to be unique, but the
> combination of them is (and is in the original data).
> (Although they do need to be analysed separately - var1 is
> the person identifier and var2 is the activity).
> 
> I am now using stat transfer to convert the file
> (specifying the option ASCII - Delimited). When I look at
> the data in the "view" option in stat transfer it
> looks fine. One relevant point might be that in the
> 'variables' window of stat transfer, the first
> variable (which is actually var1 and var2 which it is
> treating as one) is listed as string while the others are
> floats.
> 
> The good news is that I can now make the transfer and the
> col1 variable that comes up in Stata (of 22 digits,
> combining var1 and var2) is unique. One problem however is
> that when I try to encode this variable 'col1', it
> does not work as I get error message 134 (that I have tried
> to encode too many values). There are just under 1.5 million
> observations.
> 
> I then tried specifying 'col1' in stat transfer as
> either a float or long variable, but neither or these work
> - with long all the variables come up in Stata as 0, and
> with float they are no longer unique (no matter how many
> digits I allow for when formatting the variable).
> 
> I guess one option would be to convert them using
> Stattransfer in the original string format, and then find a
> way of encoding the variables (despite the problem of too
> many observations) and then somehow splitting the
> 'col1' variable into the 2 variables var1 (first 14
> digits) and var2 (next 8 digits).
> 
> 
> When I try using infix, my command is:
> ..infix var1 1-14 var2 15-22 using "filename"
> 
> I then format the variables to give them enough places
> (format %16.0g var1 var2). When I sort by var1 var2, my
> first 3 observations are as follows - clearly the
> combination of var1 and var2 is not unique:
> 
> var1	var2
> 10101000765440	1111101
> 10101000765440	1111101
> 10101000765440	1111101
> 
> 
> Any suggestions would be highly appreciated.
> 
> regards,
> Gisella
> 
> 
> --- On Tue, 7/1/08, Steven Samuels
> <sjhsamuels@earthlink.net> wrote:
> 
> > From: Steven Samuels <sjhsamuels@earthlink.net>
> > Subject: Re: st: problem in uploading data into Stata
> - data "changes"
> > To: statalist@hsphsun2.harvard.edu
> > Date: Tuesday, July 1, 2008, 3:18 PM
> > Gisella,
> > 
> > Show us an example of a data line and your -infix-
> > statements  Also,  
> > what are the item separators in your text file
> (commas,
> > tabs,..) ?   
> > If Excel can figure out the variable columns, then
> > StatTransfer can  
> > also (see ASCII input options); there is no need to go
> > through Excel.
> > 
> > -Steve
> > On Jul 1, 2008, at 11:05 AM, Gisella Young wrote:
> > 
> > > Dear all,
> > >
> > > I am trying to load a datafile in text format
> into
> > Stata. I am  
> > > using the infix command. The problem is that 1
> column
> > of data (the  
> > > firm column, which is the unique identification
> number
> > for each  
> > > observation, is different when I open it in Stata
> as
> > from what I  
> > > can see in the original text file. In fact I have
> > several such text  
> > > files for various years, and in every case the
> problem
> > is the same:  
> > > all variables upload correctly except for the
> first
> > one. Not only  
> > > is that number different but it is no longer
> unique to
> > each  
> > > observation. It is however the same number of
> digits
> > as the  
> > > original. I have checked that the infix command
> is
> > specified  
> > > correctly (eg correct number of digits).
> > >
> > > I have also tried saving the text file into excel
> (and
> > applying  
> > > text-to-columns) and then converting it into a
> stata
> > file using  
> > > Stat-transfer. When I do this all the variable
> upload
> > correctly  
> > > into Stata. The problem is that I cannot do this
> for
> > the entire  
> > > files because of their size (the limits of Excel
> mean
> > that only a  
> > > small fraction of each file can be accommodated),
> so
> > this is not a  
> > > solution.
> > >
> > > I realise that it may be difficult for someone to
> > suggest an  
> > > explanation/solution without seeing the actual
> data,
> > but I wonder  
> > > whether there are any suggestions as to what the
> > problem might  
> > > potentially be, and how to get around it?
> > >
> > > Many thanks,
> > > Gisella
> > >
> > >
> > >
> > >
> > > *
> > > *   For searches and help try:
> > > *  
> http://www.stata.com/support/faqs/res/findit.html
> > > *   http://www.stata.com/support/statalist/faq
> > > *   http://www.ats.ucla.edu/stat/stata/
> > 
> > *
> > *   For searches and help try:
> > *   http://www.stata.com/support/faqs/res/findit.html
> > *   http://www.stata.com/support/statalist/faq
> > *   http://www.ats.ucla.edu/stat/stata/
> 
> 
>       
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


      

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index