[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Survey statistics, sampling methods

From   Jen McCormick <>
Subject   st: Survey statistics, sampling methods
Date   Thu, 30 Aug 2007 13:27:46 -0700

Hi -

I want to thank everyone who provided a response to the question I posted to the list last week. All were very useful.

I have another set of questions (probably pretty simple) with regards to analyzing survey data and the use of the svyset command. I am largely concerned that I am not "naming" the steps we took in our sampling correctly in terms used in the svyset command.

We think we want to set things up like this (based on the readings I have found on the svyset command in the archives and manual):

svyset university [pw=pweight], strata(prim_sampling_unit) fpc(<?>) || pre_svy_dept, fpc(dept_ratio)

Where university = variable with the codes for each university in our sample, pw = our probability weight of 0.0.62 ( or 6/98), prim_sampling_unit = variable with the codes for each of the primary sampling strata we used, pre_svy_dept = variable with codes for each of the departments selected as our secondary strata, and dept_ratio = 0.24 (or 5/21)

Given the method (described below), are we setting the correct parameters for the svyset command?

I have included a lot of detail on our methodology and so apologize at the length of this message. If you have the patience to read and provide any insight whatsoever, it will be much appreciated.

My colleague and I conducted a national survey to determine the attitudes of life scientists toward the ethical and societal implications of their research. We sent 2000 surveys to life scientists at 7 different research universities. We received 855 surveys back and in addition, had about a 10% rate of no contact so our response rate is about 50%.

We used multi-phase sampling. Our target population is all life science researchers at US research universities. Our survey population (or sampling frame) is the top 98 NIH funded research universities in 2004 (available from a publicly available website). We categorized these universities into one of 8 strata:

Stratum/Category # of universities
medical school/bioethics presence/public 13 medical school/bioethics presence/private 13
medical school/no bioethics presence/public 45 medical school/no bioethics presence/private 19
no medical school/bioethics presence/public 0
no medical school/bioethics presence/private 0
no medical school/no bioethics presence/public 4
no medical school/no bioethics presence/private 3

We randomly selected one university from each of the 6 categories that have universities. Our home institution was our 7th institution. We are thinking that universities is our primary sampling unit (the 6 we selected in our sampling). We also think that the probability weight we want to use (the pw) is 6/98. (or do we need to approximate the total number of research universities in the US?)

We used departments as our secondary sampling unit. We categorized all the life science-related departments at our institution as either basic science or clinical and then randomly selected 3 from basic science and 2 from clinical for a total of 5 departments (secondary strata?) from which we pulled individual researchers. Across the 7 different institutions there is on average about 21 departments that would fall into our definition of life science-related departments.

We are not quite certain what our finite-population correction factors are for the universities strata and for the department strata but think these are 1/13, 1/13, 1/45, 1/19, 1/4, 1/3 and 5/21, respectively Are we correct in thinking we need to make use of these ratios?

The unit we actually surveyed is the individual researcher (graduate students, postdoctoral fellows, research staff, and faculty). Sampling was done based on position at this point (i.e. we put all grad students from university 1 in one list and then randomly selected about 66, we put all postdocs from university 1 in one list and then randomly selected about 66, etc). Selected about 250 individuals from each of the 6 universities (a few minor exceptions) and 500 from Stanford. We also tried to get equal numbers from each of the four position categories as best as possible. How do we include this into our use of the svyset command (or do we need to not worry about this)?

Again, we really appreciate any insight anyone might be able to provide.


* For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index