[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: output svymean

From	Lee Sieswerda <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	st: RE: output svymean
Date	Wed, 23 Jul 2003 19:00:14 -0400
I am not familiar with SAS. As Brian Sayer suggested, in Stata I would use
the -post- or -file- commands to get an output dataset after -svytab-.
Here's an example using -post- where I extract what I need from the
estimates and variance matrices left behind by -svytab-. Then I calculate
the confidence intervals based on the formula on page 85 of the [SVY]
manual. I've included two samples: one using the subpop() option and one
without. It will probably wrap badly after I send it, so you may have to
manually unwrap some long lines.

Best,
Lee Sieswerda, Epidemiologist
Thunder Bay District Health Unit
[email protected]


**********************************
capture log close
cd d:\Data\nwosdus\2001
log using cage.log, replace
set more off

tempname memhold
tempfile cage

postfile `memhold' region grade sex percent stderr lowci upci N using cage,
replace

gen dum = 1

* First, calculate for all grades and regions... 
	quietly svytab cage_2 dum, per se ci
	
	local df = e(df_r)
	local N = e(N)
	matrix myV = e(V)
	local se = sqrt(myV[1,1])
	
	matrix myb = e(b)
	local est = myb[1,2]
	
	* Upper Confidence Limit
	local up = log(`est'/(1-`est')) +
((invttail(`df',0.025)*`se')/(`est'*(1-`est')))
	local upci = exp(`up')/(1+ exp(`up'))

	* Lower Confidence Limit
	local lo = log(`est'/(1-`est')) -
((invttail(`df',0.025)*`se')/(`est'*(1-`est')))
	local lowci = exp(`lo')/(1+ exp(`lo'))

	post `memhold' (0) (0) (0) (100*`est') (100*`se') (100*`lowci')
(100*`upci') (`N')

* Then calculate by region...
* Subregions for subpop command already generated in crdata1.do 

forvalues r = 1(1)3 {

		quietly svytab cage_2 dum, per se ci subpop(subreg`r')
		local df = e(df_r)
		local N = e(N_sub)
		matrix myV = e(V)
		local se = sqrt(myV[1,1])
		
		matrix myb = e(b)
		local est = myb[1,2]
		

		* Upper Confidence Limit
		local up = log(`est'/(1-`est')) +
((invttail(`df',0.025)*`se')/(`est'*(1-`est')))
		local upci = exp(`up')/(1+ exp(`up'))
		
		* Lower Confidence Limit
		local lo = log(`est'/(1-`est')) -
((invttail(`df',0.025)*`se')/(`est'*(1-`est')))
		local lowci = exp(`lo')/(1+ exp(`lo'))

		post `memhold' (`r') (0) (0) (100*`est') (100*`se')
(100*`lowci') (100*`upci') (`N')
}

postclose `memhold'

set more on
log close
******************************************


-----Original Message-----
From: Rita Luk [mailto:[email protected]] 
Sent: Wednesday, July 23, 2003 9:39 AM
To: '[email protected]'
Subject: st: output svymean 


Hello all,
I am hoping that you can help me solve an issue that is causing me to pull
out what is left of my thinning hair. I normally use SAS, but am required to
use STATA for a current project. 
I am trying to create a database of results produced from the survey
tabulation procedure (svytab). I have included the code that I would use in
SAS proc freq below, so that someone who is familiar with both packages can
see what I am up to. However, because of the complex survey design I cannot
use SAS for this project. 
I plan on doing A LOT of single tabulations and cross tabulations and want
to produce an output dataset that might look like: OBS NAME1 VALUE1 NAME2
VALUE2 COUNT PERCENT1 PERCENT2 
1 sex female . . 50 0.25 .
2 sex male . . 150 0.75 .
3 sex female age old 25 0.50 0.33 
4 sex female age young 25 0.50 0.20
5 sex male age old 50 0.33 0.67
6 sex male age young 100 0.67 0.80
As you can see, this dataset would result from running two different
tabulations. The first two observations would come from a single variable
tabulation of sex. The subsequent 4 observations (3-6) would have resulted
from a cross tabulation of the dichotomous variables age and sex. Ideally I
want my program/macro to be able to handle any categorical variables
automatically regardless of the number of categories. In stata by using
several e() and matrix commands, I can get the percents, and with a little
data manipulation the counts and names, but I cannot get the values in a
variable form. In addition, I can only do this for the crosstabs (svytab),
but not for the single variable frequencies because the svyprop command does
not seem to give me saved estimates. Furthermore, the amount of coding
needed just to get the percents and counts seems excessive. There must be a
quicker way. I am doing something like: *this gives me the row percentatges
in variable form (unfortunately attached to my raw database) svytab sex age,
row matrix mrowpct=e(b)' svmat mrowpct, name(rowpct) *this gives me the col
percentages in variable form (unfortunately attached to my raw database)
svytab sex age, col matrix mcolpct=e(b)' svmat mcolpct, name(colpct) and
that is just to get TWO of the variables in my output dataset!!! There has
got to be a simpler way, since in SAS just one option will give a full
output dataset of everything I need, all one needs to do (as demonstrated
below) is just massage the output to look the way I want. Stata must have an
analogous feature! I am not asking for someone to code this for me. I need
to learn how to use this program. I just need some hints as to how to do
this, commands etc. (ie is there a simple way to get output datasets other
than piecing together a whole bunch of matrices. Thanks for any help you can
provide, Charles For those that want to see what I would do in SAS, to get
an idea of what I am doing here: I would first create the macro (I think
stata users call these programs?) /**FOR SINGLE VARIABLE TABLES USE CODE=1,
FOR CROSSTABS USE CODE=2**/ %macro outdata (var1, var2,code); %if &code=1
%then %do; proc freq data=mydata; tables &var1 / out=predata; 
run;
%end;
%if &code=2 %then %do;
proc freq data=mydata;
tables &var1*&var2 / out=predata outpct;
run;
%end;
data predata;
set predata;
rename &var1=VALUE1 &var2=VALUE2 pct_row=PERCENT1 pct_col=PERCENT2;
NAME1="&var1"; NAME2="&var2"; run; data final; set final predata; run;
%mend; Then run the macro on my data: %outdata (sex, ,1); %outdata
(sex,age,2); This program will work regarless of missing values, categories,
or number of categories to produce the output that I want. In fact almost
all of the output that I want is created by simply the addition of the
"out=" option in the tables statement of the proc freq. The rest of the
program is simply changing the names. Essentially I am hoping that stata has
the equivalent of the "OUT=" option for its svy commands, but I cannot seem
to find them. Thanks, Charles
_______________________________________________
J. Charles Victor BSc, MSc, PhD (candidate)
Department of Public Health Sciences
University of Toronto
Toronto, Ontario



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: st: question about tables
Next by Date: st: Re: marginal effects
Previous by thread: st: RE: output svymean
Next by thread: st: New versions of -eclplot- and -smileplot- on SSC
Index(es):
- Date
- Thread