Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Converting SAS code into Stata code


From   "Hugh Colaco" <hmjc66@gmail.com>
To   statalist <statalist@hsphsun2.harvard.edu>
Subject   st: Converting SAS code into Stata code
Date   Wed, 10 Dec 2008 08:34:35 -0500

Dear Statalisters,

I was given some code in SAS and need to translate it into Stata. My
dataset is in Stata. I have attempted the translation, but would
appreciate if someone would check it. I don't fully understand the
files that the author of the SAS code has created (at the beginning of
the code), but the bottom line is that the data consists of years
2002-2007. I have the same variables listed below for all these years,
each year in a separate file. In my Stata translation below, I have
used the 2002 data (original02.dta) as an example. But I will do the
same for the other years as well. Each file is very big (300MB, on
average), so I'd rather treat each one separately. I am using Stata10.


SAS code


libname tmp1 'c:\original';

data tr1; set tmp1.original1;
data tr22; set tmp1.original2;
data tr33; set tmp1.original3;

data tmp1.original0207;
set tmp1.original0203  tmp1.original04  tmp1.original05  tmp1.original06;

/* create v2 variable & recode largest values*/

data original; set tr1 tr22 tr33;
if v1='5MM+' then v1='5000000';
if v1='1MM+' then v1='1000000';

/* remove v1 under 100k)*/
data original;set original;
v2=input(v1,8.);
if v2>=100000;
run;

data original; set original;
proc sort nodupkey; by v3 v4 v5 v6 v7;

/* remove canceled)*/
data canceled (keep= v8 v9 v10); set original;
if v8='C';

data canceled (drop=v8); set canceled;
rename v9=v4;
x=1;
run;

proc sort data=canceled; by v10 v4;
proc sort data=original; by v10 v4;

data original; merge original canceled; by v10 v4;
if x=1 then delete; if v8='C' then delete;

/* remove corrected)*/
data corrected (keep= v8 v9 v10); set original;
if v8='W';

data corrected (drop=v8); set corrected;
rename v9=v4;
x=1;
run;

proc sort data=corrected; by v10 v4;

data original; merge original corrected; by v10 v4;
if x=1 then delete;
run;

/* remove price values)*/

data original; set original;
if v11 = 'N';
run;


/* (create a file with the cleaned original data)*/
data tmp1.original_clean100k; set original; run;







Equivalent Stata code


#delimit;

use "C:\original02.dta", clear;

replace v1="5000000" if v1=="5MM+";

replace v1="1000000" if v1=="1MM+";

destring v1, gen(v2);

keep if v2>=100000;

sort v3 v4 v5 v6 v7;

duplicates drop v3 v4 v5 v6 v7, force;

save temp, replace;



keep if v8=="C";

keep v9 v10;

rename v9 v4;

gen x=1;

sort v10 v4;

save temp1, replace;



use temp, clear;

sort v10 v4;

merge v10 v4 using temp1;

drop if x==1 | v8=="C";

keep if v8=="W";

keep v9 v10;

rename v9 v4;

gen x=1;

sort v10 v4;

save temp2, replace;



use temp, clear;

sort v10 v4;

merge v10 v4 using temp2;

drop if x==1;

keep if v11 == "N";

save original02_clean100k, replace;



Thanks in advance,
-- 
Hugh
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index