.- help for ^optbud^ (STB-58: sxd2) .- Optimal sampling design for 2-stage studies with fixed budget ------------------------------------------------------------- ^optbud^ depvar [indepvars] [^if^ exp] [^in^ range] [^, first(^varlist^)^ ^prev(^vecname^)^ ^b(^#^)^ ^c1(^#^)^ ^c2(^#^)^ ^var(^#^)^ ^coding(^#^)^ Options ------- ^first(^varlist^)^ specifies the first stage variables. ^prev(^vecname^)^ specifies the vector of prevalences for each stratum formed by different levels of dependent variable and first stage covariates. (See Description on how to enter the vector.) ^b(^#^)^ specifies the available budget. ^c1(^#^)^ specifies the cost per study subject at the first stage. ^c2(^#^)^ specifies the cost per study subject at the second stage. ^var(^#^)^ specifies the position in the logistic regression model of the covariate whose variance is to be minimized (i.e. optimized). For example, in the simple model Y = b0 + b1X1 + b2X2, if we want to minimize the variance of X1, then ^var^ = 2. ^coding(^#^)^ is a logical flag; the default of 0 (FALSE) means that prior to calling the ^optfixn^ function, you have to run the ^coding^ function (help @coding@ for details) to create the vector ^grp_yz^, which contains the distinct groups (strata) formed by the different levels of response (Y) and first stage covariates (Z). If you have not run ^coding^ and you call the ^optbud^ function with ^coding=1^, the ^grp_yz^ vector will be created within the ^optbud^ function, but it is imperative that the vector vecname is provided to ^optbud^ in the correct order! For this reason, we strongly suggest that any call to ^optbud^ is preceded by a call to ^coding^. Description ------------ The ^optbud^ function calculates the total number of study observations and the second-stage sampling fractions that will maximize precision subject to an available budget. The user must also supply the unit cost of observations at the first and second stage and the vector of prevalences in each of the strata defined by different levels of dependent variable and first stage covariates. Before running the ^optbud^ function you should run the ^coding^ function, (see help @coding@), to see in which order you must supply the vector of prevalences. Examples -------- This is an example based on the CASS data (see Reilly, 1996) where the association between various predictors and operative mortality is of interest. The objective is to design a two-stage study where only operative mortality, sex and a categorical weight variable will be measured at the first stage and all other covariates (weight, age, angina, chf, lvedp, surgery) determined for a sub-sample of subjects at the second stage. A pilot sample of 118 second-stage subjects is available, the prevalences of the various first-stage stata are known, and the budget is assumed to be 10,000. The following commands will calculate the optimal sampling design to minimize the variance of LVEDP subject to a total cost of 10,000, if the cost per first-stage observation = 2 and cost per second-stage observation = 15. . ^use wtpilot^ . ^coding mort sex wtcat^ [get the different strata formed by levels of dependent variable (mort) and first stage covariates (sex)] # enter the prevalences in the order suggested # by @coding@ function . ^matrix prev=(0.02,.134,.670,.054,.05,.047,.001,.004,.013,.002,.003,.002)'^ . ^optbud mort sex-surg,first(sex wtcat) prev(prev) var(7) b(10000) c1(2)i^ ^c2(15)^ Author ------ Marie Reilly, Dept. of Epidemiology & Public Health and Agus Salim, Dept. of Statistics, University College Cork, Cork, Ireland marie.reilly@@ucc.ie Also see -------- STB: STB-58: sxd2 Manual : [R] logit, glm On-line: help for @logit@, @glm@, @coding@, @optfixn@, @optprec@