Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: converting a SAS program to Stata code for use with the HCUP NIS


From   Rebecca Pope <rebecca.a.pope@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: converting a SAS program to Stata code for use with the HCUP NIS
Date   Mon, 17 Jun 2013 16:23:35 -0500

Alex,
Sorry for coming late to this discussion; hopefully this will still
help. Fair warning: this is a long post.

I don't know where you got the SAS code you posted, but the program
won't run as reported. I made notes where I changed syntax. It also
gives you just the set-up; there is no analysis advice. That's not a
serious drawback though because Austin has given you everything you
really need. That said, as far as I can tell, there are some
distinctions between what appears in the posted SAS code and in
Austin's linked post. If you are trying to adhere to HCUP's protocol,
you might want to consider that.

*** SAS to Stata code translation, where possible ***
LIBNAME In "location of NIS file"

The closest Stata approximation I can think of is -cd "location of NIS file"-.

This is not strictly necessary, but it can make subsequent programming
more concise if you have all or most of your files in the same
directory.

DATA Diabetes ; <= create a dataset in the work director called
"Diabetes". I changed ":" to ";"
SET In.nis_2001_core;
IF dxccs1=50 ; <= where the CCS code of the primary diagnosis is 50
(diabetes with complications)
dischgs = 1; <= create a variable called dschgs and set it equal to 1
for all observations, I added the ";"
RUN; <= do all that stuff

Stata version, assuming that you already have the data in Stata format:
use nis_2001_core if dxccs1==50, clear
gen dschgs = 1

The third block of code goes back and grabs records from the hospitals
dataset. It then sets these records to have a 0 weight and 0 value for
all variables other than the hospital ID and stratum. Some notes: (1)
Using 2 datasets in a SET statement without a BY statement implicitly
appends the second dataset to the first. (2) You can't name variables
in SAS (or Stata for that matter) with hyphens, so there can be no
variable nis-stratum. Let's assume it is the nis_stratum variable
referred to later.

DATA Combined ;
SET Diabetes <= I deleted the ;
in.nis_2001_hospital (in=inhosp keep=hospid nis_stratum); <= this
'names' the in.nis_2001_hospital dataset inhosp so you can refer to it
in shorthand, I added ");"
IF inhosp THEN DO; <= if the record came from the hospital-level data
discwt=0; <= set weight to 0
died = 0; <= all of these statements need ;'s
dischgs = 0;
los = 0;
totchg = 0;
END;
RUN;

Remember, you've still got the Diabetes data in memory in Stata. Now,
I'm going to modify that -gen dschgs = 1- line from earlier because at
this point we can make use of -append-'s generate() option. With the
-generate()- option, Stata creates a variable = 0 if it is from the
master dataset (Diabetes) and 1 if it is from the using dataset
(nis_2001_hospital). We can just recode that.

append using nis_2001_hospital, keep(hospid nis_stratum) generate(dschgs)
recode dschgs (0=1 1=0)
foreach var of varlist discwt died los totchg {
 replace `var' = 0 if dschgs==0
}

The -if- condition above specifies that values for each variable be
set to 0 for appended observations from the hospital dataset but not
the original diabetes dataset.

The fourth bit is just writing a fixed-format text file of the data
that you just created. In the NIS data, negative values are codes for
specific types of missing values. So, in essence the IF statements
here are all recoding specific missing values to generic missing
values.

DATA _NULL_; <= don't create a SAS dataset, changed from -NULL_
SET Combined ; <= use the combined dataset
FILEREF ; <= this is gibberish without a file specification which
should have been declared earlier
IF los <0 THEN los = .
IF died << 0 THEN died = . ;
IF totchg < 0 THEN totchg = . ;
PUT nis_stratum 1-4 hospid 6-10 died 12 los 14-17dischgs 19 totchg
21-27 +1 discwt ; <= write text to file
RUN;

Stata version:
I suspect you'll need to use the various -file- commands (see [P]
file). Others may know how to get -outfile- to write to specific
places or how to use -file- for this purpose, but I do not and I don't
want to give you incorrect information.

To just duplicate the missing recodes:
mvdecode los died totchg, mv(-999999/-1 = .)

I'm going off of memory with that range. Make sure that -999999 is
consistent with the coding for the total charges (i.e. look at the
codebook from HCUP). Probably the highest number you'll see is -7, but
-1 keeps you safe. Summing up, the SAS code posted is just
illustrating how to set up your data, not performing any sort of
analysis.
*** End of the "translation" part ***

*** Concise version of Stata code to create sample data by HCUP method ***
cd "location of NIS file"
use nis_2001_core if dxccs1==50, clear

append using nis_2001_hospital, keep(hospid nis_stratum) generate(dschgs)
recode dschgs (0=1) (1=0)
foreach var of varlist discwt died los totchg {
   replace `var' = 0 if dschgs==0
}
mvdecode los died totchg, mv(-999999/-1 = .)
*** end ***

This will give you a dataset that has all sampled discharges with a
primary diagnosis for diabetes with complications. For these "real"
records, you have the hospital-specific weight for each discharge
(discwt). These records are flagged by having dschgs = 1. Appended to
this, you have 1 record for every hospital sampled by the NIS with
discwt = 0 instead of its true weight and flagged by dschgs = 0.

This differs from the post that Austin referenced
(http://www.stata.com/statalist/archive/2007-11/msg00810.html). In
Austin's code, you keep the weights for the observations outside the
subpopulation & so get a correct population total ("better" column).
With HCUP's method, you'll wind up the subpopulation size and
population size being the same. However, you will get the same test
statistics.

Here is some code that can be appended to the code in Austin's post to
see the differences.
*** begin ***
webuse nhanes2f, clear
preserve   // create fake "master" set of PSUs, like nis_2001_hospital
duplicates drop stratid psuid, force
tempfile tmp2
save `tmp2'
restore  // end fake master code
keep if highlead==1
append using `tmp2', keep(stratid psuid) generate(islead)
recode islead (0=1) (1=0)
foreach var of varlist heartatk female weight diabetes finalwgt highlead {
  replace `var' = 0 if islead==0
}
// real analysis, not illustrated in SAS code
svyset psu [pw=finalwgt], strat(stratid)
svy, subpop(highlead): logit heartatk female weight diabetes
est sto hcup
esttab correct approx better hcup, mti nogaps sca(N_pop N_subpop F)
*** end ***

Hope this helps,
Rebecca

< snip >
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index