Steve Samuels <sjsamuels@gmail.com> |

statalist@hsphsun2.harvard.edu |

Re: st: Svy commands for cluster sampling |

Sun, 12 Feb 2012 03:56:07 -0500 |

Dear Hajimi: Unfortunately, with simple random sampling of hospitals and clinics, you will be unable to estimate patient totals unless the facilities of each type are similar in size. To illustrate: suppose that there are a few larger hospitals (or clinics), but that most are small. A simple random sample will consist mostly of smaller facilities, and estimates of the numbers of the total number of flu patients in the area will be too low. I suggest that you consult a survey statistician, who, with the aid of post-stratification techniques, might be able to overcome this problem. To note for later studies: the standard design for studies of this kind is sampling with probability proportional to size (PPS). The "size" measure is something roughly proportional to the number of flu patients, for example, in hospitals, the number of beds. For the rest, I suggest that you or your colleague study a good sampling text, such as Sharon Lohr, (2009). Sampling: Design and Analysis (2nd ed.). Boston, MA: Cengage Brooks/Cole, or Heeringa, S., West, B. T., & Berglund, P. A. (2010). Applied survey data analysis. Boca Raton, FL: Chapman & Hall/CRC. The latter is not so easy to read but contains many Stata examples Best wishes Steve sjsamuels@gmail.com On Feb 9, 2012, at 2:02 AM, Hajime SATO wrote: Dear Statalisters, My colleague brought me questions on svy commands in Stata, which I could not answer quickly. I would appreciate it if any of you provide your insights. Here are questions: --------------------- Suppose there are N hospitals and M clinics in the study area. From them, we randomly sampled n hospitals and m clinics, and asked them how many flu patients visited them in the past month, and what characteristics they had. Some hospitals reported that they had 5 patients, while some others had none. In the same way, some clinics had 3 patients, while some others had none. In the situation described above, I think we can infer (1) the number of patients in the study area in the past month, using the sampling weights of n/N and m/M. In this case, (how) can we use svy commands (or what others in Stata)? Next, we wish to know the mean age and overall sex ratio of all the flu patients in the study area. Can we (2) calculate the mean age and overall male ratio of the whole patients, using the mean age and male ratio of the patients who visited each hospital/ clinic (and using the sampling weights for hospitals and clinics)? There are hospitals/ clinics that had no patient. If we use the record of each patient, instead of the means (averaged values) of the patients' characteristics reported by each hospital/ clinics, again, (how) can we use svy commands (or what others in Stata)? Then, we wish to (3) describe the characteristics of flu patients, and test a set of hypotheses, such as difference in age by sex. Every hospital has its id number Hi, while every clinic has the one Mi. Can (and how can) we use svy commands (or others in Stata) to do, for example, a t-test to examine difference in mean age by sex. Each patient's record looks like the following: ------------------------ -------------------------------------------- id hosp/clin h/c_no age sex highest_fever duration --------------------------------------------------------------------- 1 h 1 16 m 100 (F) 4 (days) 2 c 5 43 f 94 6 3 ... --------------------------------------------------------------------- There are only data on flu cases (none about those not suffering from flu). -- Looking forward to any suggestions. ---------------------------------------------------------- Hajime SATO, MD, MPH, DrPH, PhD Director Department of Health Policy and Technology Assessment National Institute of Public Health Japan

