Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: SPSS to Stata issues


From   Phil Schumm <pschumm@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: SPSS to Stata issues
Date   Mon, 15 Aug 2005 15:15:10 -0500

At 01:52 PM 8/15/2005 -0400, Eric Uslaner wrote:
I have a huge data set (actually the General Social Survey early release for 2004 that includes the entire GSS). The GSS data set is in SPSS and has almost 5000 variables, most of which are only asked in a few years. I am working with regular (intercooled) Stata 9 and do not have SE. I have truncated the data set in SPSS but the data set still has the same number of variables, most of which will be entirely missing data. Can't use StatTransfer easily since this would require looking at each variable and dropping those all missing one by one. If I had Stata SE, I could use Nick Cox's dropmiss to get rid of variables with no valid cases. But my problem now is that I can't get the data into Stata at all (too many variables).

Something like the following should work:


clear
tempvar recno
gen byte `recno' = .
tempfile mydata
save "`mydata'"

qui des using foo, varlist
loc varlist `r(varlist)'

tempvar mergevar
loc count 0
qui foreach var of loc varlist {
loc short_list `short_list' `var'
loc count `++count'
if (1000<`count') | ("`ferest()'" == "") {
use `short_list' using foo
gen `recno' = _n
dropmiss
merge `recno' using "`mydata'", sort _merge(`mergevar')
drop `mergevar'
save "`mydata'", replace
loc count 0
mac drop short_list
}
}


This will read variables from a file foo.dta (located in the working directory) 1000 at a time, and use -dropmiss- to drop those that are entirely missing. What you'll be left with is a file containing all of the original variables that have at least one non-missing value (assuming that such a file has fewer than 2,047 vars and fits in memory). If this doesn't do exactly what you need, you should be able to modify it so that it does.


-- Phil

P.S. A similar technique may be used to read in a dataset which will not fit within the available memory, but will after it is compressed (just replace the call to -dropmiss- with a call to -compress-).
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index