Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: SPSS to Stata - Variable with 14 digits not transformed correctly


From   Sergiy Radyakin <serjradyakin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: SPSS to Stata - Variable with 14 digits not transformed correctly
Date   Fri, 2 Mar 2012 15:49:42 -0500

Dear Julian, Nick,

being the author of the -usespss- please let me comment on what it is doing.

1) Stata can work with 14 digit IDs stored as doubles:
***************************************************************
clear
set obs 2
generate double id = 12345678901234
replace  id = 12345678901235 in 2
format id %21x
list
assert id[1]==id[2]
***************************************************************
2) Some hints on working with long IDs are here:
http://www.ats.ucla.edu/stat/stata/faq/longid.htm

3) SPSS format does not provide the float/double storage types.
Numeric values are stored differently from Stata (2 storage types, one
of them parametrized), but we can be confident that in your case of
14-digit numbers the equivalent of the double type is used (equivalent
up-to bytes order and some other details related to missings). In the
process of conversion -usespss- will "decompress" the data and try to
determine it's type. E.g. if it determines that the variable contains
only zeroes and ones, it will use byte as Stata's storage type for
this variable. -usespss- never rounds or truncates numeric values.

4) Given #3 is implemented correctly (things happen you know), any
numeric value stored in the SPSS should turn out as exactly the same
numeric value in Stata. (this is not true for strings). The only
exception that comes to my mind now is the extended missing values. In
SPSS any value like 3 or -99999997 can be assigned the meaning of
missing. In Stata this is not possible, and extended missing values
have firmly fixed values dependent on the storage type. Exact values
can be seen here: http://www.stata.com/help.cgi?dta. -usespss-
replaces original values (like the 3 and -99999997 above) with Stata's
.a, .b, .c.   Date/time variables are converted as numeric values. The
value will be the same, but it will look strange in Stata. Bill Gould
had an excellent entry in his blog about why this happens and what to
do with it:
http://blog.stata.com/2011/01/05/using-dates-and-times-from-other-software/

5) I allow for a remote possibility that the data file contains more
data than SPSS itself is showing to you, hence the records' IDs might
be unique in SPSS but not unique in the file in general. To see if
this might be an explanation - I will have to see the file.

6) If the file can't be shared, I would suggest you list out just the
ID's (about a dozen will do) without any further information and email
them as plain text to me. This requires SPSS of course.

7) If you have SPSS - then of course you can turn the ID variable into
string and proceed with the conversion, then destring it in Stata.

8) If anyone having SPSS can create an example dataset which exhibits
the same problem, please email me a .sav file along with the content
in plaintext (or Stata) and comments on how the example was prepared
(versions etc).

9) Stata's "float" type is not possible output of -usespss-. Let me
know if you see this type after conversion.

Please let me know if I can be of any further assistance. If you
decide to send the data, please zip it and mail to
sradyakin(%at%)worldbank.org , replace (%at%) with @.

Best,
  Sergiy Radyakin, Economist,
  Research Department (DECRG)
  The World Bank









On Fri, Mar 2, 2012 at 6:54 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> In this context values like 1.001e+13 are to be thought of as, strictly, sets of values which are all displayed in the same way with a particular format. Crucially, changing the display format does nothing to change what is stored; by definition, it only changes how that is displayed.
>
> 14-digit identifiers can only be held accurately in Stata as string variables or -double- variables. If your identifier variable is -float- instead, then you will have lost precision in importing to Stata and the only way to regain that precision is to read the data in again.  Recasting from -float- to -double- does nothing useful as the extra details have been lost already.
>
> -usespss- is user-written (SSC). Its author is intermittently active on Statalist. I've never used it but I see no way in its help to change how particular variables are imported. I guess that you need some other solution. As I don't use SPSS or SPSS files at all I can only guess that you need to look at export options in SPSS and import options in Stata and find a match. Others on this list who do use SPSS should be able to add better advice.
>
> In short, this problem as you describe it cannot be fixed in Stata. You must import again.
>
> Nick
> n.j.cox@durham.ac.uk
>
> Julian Emmler
>
>  I'm new to this forum so I don't know yet the most accurate way to post my
> question but I hope it will be understandable and I would be greatful for
> every question for clarification. My Problem with Stata is concerned with
> transforming household data for the South African labour market which is
> only available in SPSS format to Stata. I did this with several datasets
> also for the South African labour market and used the "usespss" command in
> Stata which worked just fine. However with the last dataset I encountered a
> problem:
>  In the dataset, to identify a household, a 14 digit number called the
> Unique Household Identifier is used. However, if I transform the data
> from SPSS to Stata, the values of the Household identifier are not it's
> real values any more but are shortened e.g. to 1.001e+13. Thus the Unique
> household identifier is not unique anymore. I try several things, e.g.
> transformin the vaiable to a double variable and increasing the number of
> digits displayed. This helps in that regard, that the number is now
> displayed correctly in the data browser, however the value didn't change.
> After searching the internet, I've come to the conclusion that this probelm
> has something to do with the length of the variable, i.e. that 14 digits is
> too long to be handled by Stata. Another indicator for this is that with
> earlier datasets I had no problem because the Unique Household Identifier
> was 12 digits long. I wanted to ask now if you know any way to
> transfer the SPSS data to Stata correctly or a way to manipulate the
> data afterwards so i attains its true values.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index