Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Sergiy Radyakin <serjradyakin@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: SPSS to Stata - Variable with 14 digits not transformed correctly |

Date |
Fri, 2 Mar 2012 15:49:42 -0500 |

Dear Julian, Nick, being the author of the -usespss- please let me comment on what it is doing. 1) Stata can work with 14 digit IDs stored as doubles: *************************************************************** clear set obs 2 generate double id = 12345678901234 replace id = 12345678901235 in 2 format id %21x list assert id[1]==id[2] *************************************************************** 2) Some hints on working with long IDs are here: http://www.ats.ucla.edu/stat/stata/faq/longid.htm 3) SPSS format does not provide the float/double storage types. Numeric values are stored differently from Stata (2 storage types, one of them parametrized), but we can be confident that in your case of 14-digit numbers the equivalent of the double type is used (equivalent up-to bytes order and some other details related to missings). In the process of conversion -usespss- will "decompress" the data and try to determine it's type. E.g. if it determines that the variable contains only zeroes and ones, it will use byte as Stata's storage type for this variable. -usespss- never rounds or truncates numeric values. 4) Given #3 is implemented correctly (things happen you know), any numeric value stored in the SPSS should turn out as exactly the same numeric value in Stata. (this is not true for strings). The only exception that comes to my mind now is the extended missing values. In SPSS any value like 3 or -99999997 can be assigned the meaning of missing. In Stata this is not possible, and extended missing values have firmly fixed values dependent on the storage type. Exact values can be seen here: http://www.stata.com/help.cgi?dta. -usespss- replaces original values (like the 3 and -99999997 above) with Stata's .a, .b, .c. Date/time variables are converted as numeric values. The value will be the same, but it will look strange in Stata. Bill Gould had an excellent entry in his blog about why this happens and what to do with it: http://blog.stata.com/2011/01/05/using-dates-and-times-from-other-software/ 5) I allow for a remote possibility that the data file contains more data than SPSS itself is showing to you, hence the records' IDs might be unique in SPSS but not unique in the file in general. To see if this might be an explanation - I will have to see the file. 6) If the file can't be shared, I would suggest you list out just the ID's (about a dozen will do) without any further information and email them as plain text to me. This requires SPSS of course. 7) If you have SPSS - then of course you can turn the ID variable into string and proceed with the conversion, then destring it in Stata. 8) If anyone having SPSS can create an example dataset which exhibits the same problem, please email me a .sav file along with the content in plaintext (or Stata) and comments on how the example was prepared (versions etc). 9) Stata's "float" type is not possible output of -usespss-. Let me know if you see this type after conversion. Please let me know if I can be of any further assistance. If you decide to send the data, please zip it and mail to sradyakin(%at%)worldbank.org , replace (%at%) with @. Best, Sergiy Radyakin, Economist, Research Department (DECRG) The World Bank On Fri, Mar 2, 2012 at 6:54 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > In this context values like 1.001e+13 are to be thought of as, strictly, sets of values which are all displayed in the same way with a particular format. Crucially, changing the display format does nothing to change what is stored; by definition, it only changes how that is displayed. > > 14-digit identifiers can only be held accurately in Stata as string variables or -double- variables. If your identifier variable is -float- instead, then you will have lost precision in importing to Stata and the only way to regain that precision is to read the data in again. Recasting from -float- to -double- does nothing useful as the extra details have been lost already. > > -usespss- is user-written (SSC). Its author is intermittently active on Statalist. I've never used it but I see no way in its help to change how particular variables are imported. I guess that you need some other solution. As I don't use SPSS or SPSS files at all I can only guess that you need to look at export options in SPSS and import options in Stata and find a match. Others on this list who do use SPSS should be able to add better advice. > > In short, this problem as you describe it cannot be fixed in Stata. You must import again. > > Nick > n.j.cox@durham.ac.uk > > Julian Emmler > > I'm new to this forum so I don't know yet the most accurate way to post my > question but I hope it will be understandable and I would be greatful for > every question for clarification. My Problem with Stata is concerned with > transforming household data for the South African labour market which is > only available in SPSS format to Stata. I did this with several datasets > also for the South African labour market and used the "usespss" command in > Stata which worked just fine. However with the last dataset I encountered a > problem: > In the dataset, to identify a household, a 14 digit number called the > Unique Household Identifier is used. However, if I transform the data > from SPSS to Stata, the values of the Household identifier are not it's > real values any more but are shortened e.g. to 1.001e+13. Thus the Unique > household identifier is not unique anymore. I try several things, e.g. > transformin the vaiable to a double variable and increasing the number of > digits displayed. This helps in that regard, that the number is now > displayed correctly in the data browser, however the value didn't change. > After searching the internet, I've come to the conclusion that this probelm > has something to do with the length of the variable, i.e. that 14 digits is > too long to be handled by Stata. Another indicator for this is that with > earlier datasets I had no problem because the Unique Household Identifier > was 12 digits long. I wanted to ask now if you know any way to > transfer the SPSS data to Stata correctly or a way to manipulate the > data afterwards so i attains its true values. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: SPSS to Stata - Variable with 14 digits not transformed correctly***From:*Julian Emmler <ju.emmler@googlemail.com>

**st: RE: SPSS to Stata - Variable with 14 digits not transformed correctly***From:*Nick Cox <n.j.cox@durham.ac.uk>

- Prev by Date:
**Re: st: Factor variables and outreg/outreg2** - Next by Date:
**Re: st: appending several files with different variable names** - Previous by thread:
**RE: st: RE: SPSS to Stata - Variable with 14 digits not transformed correctly** - Next by thread:
**st: Obtaining (and testing difference among) coefficients of nonlinear interaction of continuous variables at specific values with xtmelogit** - Index(es):