[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st:appropriate test
Steve Samuels <email@example.com>
Re: st:appropriate test
Tue, 12 Oct 2010 11:13:07 -0400
Rajaram responded to me privately with the following:
>I just want to know why we should consider round as a second stage
>strata as per your suggestion of survey setting.
>I would like to inform you that the sample frame used in the second
>stage is not households, it is the number of adult members in the
>households enumerated in the Census. We have listed all the members in
>the household and we used this list of members to select the
>In the data set the finite population correction (FPC) was not
>included. So, I just want to know how I should calculate the FPC at
>the first stage. Please, inform.
>I would like to inform you that we may first do a descriptive analysis
>and then we may like to do a multivariate analysis. We also want to
>know the OR, so we are doing the logistic regression as discussed in
>my earlier mail.
By definition, strata are groups from which samples are drawn
independently. For the PSUs that was true of the urban/rural place
strata. Within existing PSUs, independent samples of adults were taken
in each round, Therefor stratification is by round is at that stage.
This specification _might_ reduce standard errors somewhat.
The fpc will be one of two numbers: the number of villages in the
rural stratum in the district or the number of urban blocks in the
urban stratum. Those numbers should be available from the Census that
was used to plan the survey.
Steven J. Samuels
18 Cantine's Island
Saugerties NY 12477
On Fri, Oct 8, 2010 at 2:26 PM, Steve Samuels <firstname.lastname@example.org> wrote:
> Rajaram Subramanian Potty
> I recommend that you add all the sampling stages to your design.
> Include fpcs, especially in the first stage, because you need all the
> help that you can get in reducing standard errors.
> something like:
> svyset psu [pweight=], strata(place) fpc() || _n, strata(round) fpc()
> One thing is unclear: the sampling frame you used to select males and
> females. If your sampling frame consisted of households, for example,
> then replace "_n" in the -svyset- statement above with the household
> id variable.
> Which analysis?
> As you describe your analysis, it is descriptive (or "enumerative"):
> you want to estimate prevalence rates in one district in 2003 and 200,
> and their difference.
> For a descriptive analysis, significance testing is inappropriate.
> Why? If you had tested every adult in the district, you would never
> expect the 2002 and 2008 prevalence rates to be _exactly_ the same.
> (WG Cochran, (1977). Sampling techniques (3rd ed.). New York: Wiley.,
> p.39; WE Deming. (1966). Some theory of sampling. New York: Dover
> Publications, Chapter 7, p 247, "Distinction between enumerative and
> analytic studies").
> (There are descriptive studies where hypothesis testing is important,
> e.g. quality assurance sampling ( P Levy and S Lemeshow, Sampling of
> Populations, Wiley, 2008; p. 429), but your study doesn't seem to be
> one of them. )
> The question is therefore not "Are rates in the two years different?",
> but "How different are ?" Confidence intervals provide the answer.
> From a public health point of view, I consider 95% confidence to be
> too stringent. I'd recommend 90% or even 80%.
> -svy tab- will provide a direct answer to the question: " What are
> the rates, and how different are they." I don't find the odds ratios
> from -svy: logistic- to be informative unless transformed to rate
> differences; -svy: tab- is based on the logit transform, and does it
> for you.
> One other poihnt: If you took equal numbers of people in each village
> and equal numbers in each urban block, your sample should be
> self-weighting, and your weighted prevalence rates and observed rates
> should be very similar. If so, it would simplify your tables to report
> the observed numerators, denominators, and rates, with the CIs from
> the weighted analysis.
> Steven J. Samuels
> sjsamuels@gmail. com
> 18 Cantine's Island
> Saugerties NY 12477
> Voice: 845-246-0774
> Fax: 206-202-4783
> On Fri, Oct 8, 2010 at 1:25 AM, Rajaram Subramanian Potty
> <rajara999@gmail. com> wrote:
>> I appologise that I did not give much information. In the year 2002,
>> there is a cross-sectional study conducted to estimate the STI
>> prevalence in one of the districts. We have two stratums rural and
>> urban. From the rural areas, 10 villages (PSUs) were selected
>> systematically using PPS. 20 urban blcocks (PSUs) were selected from
>> the list of urban blocks in the district using systematic selection.
>> We have conducted a compelte census in this selcted areas and prepared
>> a sampling frame for selecting the adult males and feamles aged 15-49.
>> The targeted samples of around 6600 were selected from this sample
>> frame. We have calculated the sample weights.
>> Again in the year 2008, we have repeated the survey in the same areas.
>> Conducted the census and selected the required number of 6600 adult
>> males and females in the same way as selected in the year, 2002. So,
>> the respondents selected are independent and it is not a follow-up
>> We wanted to test, over all whether the difference in STI prevalence
>> between the year 2002 and 2008 is signficant or not. Also want to
>> examine the difference in some particular groups such as place of
>> residence (rural/Urban), sex, age etc. We are not interested in the
>> difference in the prevalence by PSUs,
>> Presently I am using the simple sytax of survey setting:
>> svyset psu [pweight=wt], strata(place)
>> svy: logistic syphilis round
>> The variable round indicates whether the survey is in the year 2002 or
>> 2008 and p-value from the logistic regression is used for checking
>> whether there is any significant difference.
>> Thanks and regards,
>> RAJARAM. S
>> On Thu, Oct 7, 2010 at 8:48 PM, Steve Samuels <sjsamuels@gmail. com> wrote:
>>> I agree with Ronan that more information is necessary: are you
>>> interested in estimating rates and changes just for the sampled PSUs,
>>> or for the population from which they are sampled? If you are
>>> interested in rates just for those PSUs, then create a combo PSU-round
>>> stratum variable, e.g. with:
>>> egen cstratum = group(area round)
>>> Then -svyset- a psu variable equal to the second stage sampling unit
>>> (ssu2) in the survey:
>>> svyset ssu2 [pweight= ], strata(cstratum)..
>>> If you want to estimate for the population from which the areas were sampled:
>>> svyset area [pweight=], strata(original stratum) || ssu2, strata(round)
>>> For descriptive estimates of prevalence rates and their differences, I
>>> recommend -svy: tab-, which uses a logit transformation for
>>> proportions to avoids CIs that extend below zero. You can add finite
>>> population corrections if these would make a difference.
>>> webuse nhanes2
>>> svy: tab sex diabetes, row ci se llwald
>>> matrix list e(b)
>>> lincom _b[p22] - _b[p12]
>>> But you have not given us enough details about the purpose of your
>>> study that I can be confident of these specifications: for example,
>>> whether you are confining your estimates to particular
>>> I don't agree with Ronan's recommendation of an event-time model. You
>>> have cross-sectional prevalence data, not a cohort. So you would need
>>> a "current status" (or "status quo") model: the information for
>>> each individual is their current age and whether or not they have the
>>> disease of interest; other words, every individual is right-censored
>>> or left-censored. From this information it is possible to reconstruct
>>> a survival curve analogous to a current life table. I'd recommend a
>>> logistic model, instead. For such regression analyses, don't use the
>>> Steven J. Samuels
>>> sjsamuels@gmail. com
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> Voice: 845-246-0774
>>> Fax: 206-202-4783
>>> On Wed, Oct 6, 2010 at 4:59 AM, Ronan Conroy <rconroy@rcsi. ie> wrote:
>>>> On 6 DFómh 2010, at 07:42, Rajaram Subramanian Potty wrote:
>>>>> I have data from two rounds of survey conducted in the same areas
>>>>> (PSUs). But the individual are selected independently in both the
>>>>> rounds from these areas using the same statistical approaches. What
>>>>> would be the appropriate analysis that would be carried out to test
>>>>> the difference in some of the indicators between the two periods. For,
>>>>> example I want to test the difference in HIV prevalence between the
>>>>> two rounds. Is it appropriate to use the survey command by considering
>>>>> the PSUs are the same in both the rounds and setting the survey design
>>>>> according to our study. After that fitting svy: logistic to examine
>>>>> the difference in two rounds, is this correct way of testing the
>>>>> difference between the two rounds. Kindly suggest.
>>>> My first reaction would be that the most important thing needed here is a
>>>> sample weighting scheme that allows you to extrapolate from the sample to
>>>> the underlying population.
>>>> Are the areas PSUs or strata? In other words, were the areas selected at
>>>> random or deliberately chosen? This affects your analysis.
>>>> If you have presumed age of infection, you could consider using an
>>>> event-time model approach, using age as the time variable. This would allow
>>>> you to look at the shape of the hazard function. Even if you don't, the
>>>> hazard curve will show the cumulative prevalence by age (rather than the
>>>> incidence) but may still be of interest.
>>>> Ronán Conroy
>>>> Associate Professor
>>>> Division of Population Health Sciences
>>>> rconroy@rcsi. ie
>>>> Royal College of Surgeons in Ireland
>>>> Epidemiology Department,
>>>> Beaux Lane House, Dublin 2, Ireland
>>>> +353 (0)1 402 2431
>>>> +353 (0)87 799 97 95
>>>> +353 (0)1 402 2764 (Fax - remember them?)
>>>> http://rcsi. academia. edu/RonanConroy
>>>> P Before printing, think about the environment
* For searches and help try: