Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st:appropriate test

From   Rajaram Subramanian Potty <>
Subject   Re: st:appropriate test
Date   Wed, 13 Oct 2010 10:37:58 +0530

Dear Samuels,
Thank you very much for your suggestions. I have the manuals other
than the survey, that too for lower version stata. But the help in the
svyset suggests FPC is the following:
fpc(varname) requests a finite population correction for the variance
        estimates.  If varname has values less than or equal to 1, it is
        interpreted as a stratum sampling rate f_h = n_h/N_h, where n_h =
        number of units sampled from stratum h and N_h = total number of
        units in the population belonging to stratum h.  If varname has
        values greater than or equal to n_h, it is interpreted as containing
        N_h.  It is an error for varname to have values between 1 and n_h or
        to have a mixture of sampling rates and stratus sizes.

So, I may take the sampling rate which is mentioned above.

Thanks and regards,


On Tue, Oct 12, 2010 at 8:43 PM, Steve Samuels <> wrote:
> Rajaram responded to me privately with the following:
>>I just want to know why we should consider round as a second stage
>>strata as per your suggestion of survey setting.
>>I would like to inform you that the sample frame used in the second
>>stage is not households, it is the number of adult members in the
>>households enumerated in the Census. We have listed all the members in
>>the household and we used this list of members to select the
>>In the data set the finite population correction (FPC) was not
>>included. So, I just want to know how I should calculate the FPC at
>>the first stage. Please, inform.
>>I would like to inform you that we may first do a descriptive analysis
>>and then we may like to do a multivariate analysis. We also want to
>>know the OR, so we are doing the logistic regression as discussed in
>>my earlier mail.
> By definition, strata are groups from which samples are drawn
> independently.  For the PSUs that was true of the urban/rural place
> strata. Within existing PSUs, independent samples of adults were taken
> in each  round,  Therefor stratification is by round is at that stage.
>  This specification _might_  reduce standard errors somewhat.
> The fpc will be one of two numbers: the number of villages in the
> rural stratum in the district or the number of urban blocks in the
> urban stratum.  Those numbers should be available from the Census that
> was used to plan the survey.
> Steve
> Steven J. Samuels
> 18 Cantine's Island
> Saugerties NY 12477
> Voice: 845-246-0774
> Fax:    206-202-4783
> On Fri, Oct 8, 2010 at 2:26 PM, Steve Samuels <> wrote:
>> --
>>  Rajaram Subramanian Potty
>> I recommend that you add all the sampling stages to your design.
>> Include fpcs, especially in the first stage, because you need all the
>> help that you can get in reducing standard errors.
>> something like:
>> svyset psu [pweight=], strata(place) fpc() || _n, strata(round) fpc()
>> One thing is unclear: the sampling frame you used to select males and
>> females. If your sampling frame consisted of households, for example,
>> then replace "_n" in the -svyset- statement above with the household
>> id variable.
>> Which analysis?
>> As you describe your analysis, it is descriptive (or "enumerative"):
>> you want to estimate prevalence rates in one district in 2003 and 200,
>> and their difference.
>> For a descriptive analysis, significance testing is inappropriate.
>> Why? If you had  tested every adult in the district, you would never
>> expect the 2002 and 2008 prevalence rates to be _exactly_ the same.
>> (WG Cochran, (1977). Sampling techniques (3rd ed.). New York: Wiley.,
>> p.39; WE Deming. (1966). Some theory of sampling. New York: Dover
>> Publications, Chapter 7, p 247, "Distinction between enumerative and
>> analytic studies").
>> (There are descriptive studies where hypothesis testing is important,
>> e.g. quality assurance sampling ( P Levy and S Lemeshow, Sampling of
>> Populations, Wiley, 2008; p. 429), but your study doesn't seem to be
>> one of them. )
>> The question is therefore not "Are rates in the two years different?",
>> but "How different are ?" Confidence intervals provide the answer.
>> From a public health point of view, I consider 95% confidence to be
>> too stringent. I'd recommend 90% or even 80%.
>>  -svy tab- will provide a direct answer to the question: " What are
>> the rates, and how different are they." I don't find the odds ratios
>> from -svy: logistic- to be informative unless transformed to rate
>> differences; -svy: tab- is based on the logit transform, and does it
>> for you.
>> One other poihnt: If you took equal numbers of people in each village
>> and equal numbers in each urban block, your sample should be
>> self-weighting, and your weighted prevalence rates and observed rates
>> should be very similar. If so, it would simplify your tables to report
>> the observed numerators, denominators, and rates, with the CIs from
>> the weighted analysis.
>> --Steve
>> Steven J. Samuels
>> sjsamuels@gmail. com
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>> On Fri, Oct 8, 2010 at 1:25 AM, Rajaram Subramanian Potty
>> <rajara999@gmail. com> wrote:
>>> I appologise that I did not give much information. In the year 2002,
>>> there is a cross-sectional study conducted to estimate the STI
>>> prevalence in one of the districts. We have two stratums rural and
>>> urban. From the rural areas, 10 villages (PSUs) were selected
>>> systematically using PPS. 20 urban blcocks (PSUs) were selected from
>>> the list of urban blocks in the district using systematic selection.
>>> We have conducted a compelte census in this selcted areas and prepared
>>> a sampling frame for selecting the adult males and feamles aged 15-49.
>>> The targeted samples of around 6600 were selected from this sample
>>> frame. We have calculated the sample weights.
>>> Again in the year 2008, we have repeated the survey in the same areas.
>>> Conducted the census and selected the required number of 6600 adult
>>> males and females in the same way as selected in the year, 2002.  So,
>>> the respondents selected are independent and it is not a follow-up
>>> study.
>>> We wanted to test, over all whether the difference in STI prevalence
>>> between the year 2002 and 2008 is signficant or not.  Also want to
>>> examine the difference in some particular groups such as place of
>>> residence (rural/Urban), sex, age etc. We are not interested in the
>>> difference in the prevalence by PSUs,
>>> Presently I am using the simple sytax of survey setting:
>>> svyset psu [pweight=wt], strata(place)
>>> svy: logistic syphilis round
>>> The variable round indicates whether the survey is in the year 2002 or
>>> 2008 and p-value from the logistic regression is used for checking
>>> whether there is any significant difference.
>>> Thanks and regards,
>>> On Thu, Oct 7, 2010 at 8:48 PM, Steve Samuels <sjsamuels@gmail. com> wrote:
>>>> I agree with Ronan that more information is necessary: are you
>>>> interested in estimating rates and changes just for the sampled PSUs,
>>>> or for the population from which they are sampled?  If you are
>>>> interested in rates just for those PSUs, then create a combo PSU-round
>>>> stratum variable, e.g. with:
>>>> *********
>>>> egen cstratum = group(area round)
>>>> ********
>>>> Then -svyset-  a psu variable equal to the second stage sampling unit
>>>> (ssu2) in the survey:
>>>> ****************
>>>> svyset ssu2 [pweight= ], strata(cstratum)..
>>>> ****************
>>>> If you want to estimate for the population from which the areas were sampled:
>>>> ******************
>>>> svyset area [pweight=], strata(original stratum)  || ssu2, strata(round)
>>>> ********************
>>>> For descriptive estimates of prevalence rates and their differences, I
>>>> recommend -svy: tab-, which uses a logit transformation for
>>>> proportions to avoids CIs that extend below zero.  You can add finite
>>>> population corrections if these would make a difference.
>>>> ************************************
>>>> webuse nhanes2
>>>> svy: tab sex diabetes, row ci se llwald
>>>> matrix list e(b)
>>>> lincom _b[p22] - _b[p12]
>>>> *************************************
>>>> But you have not given us enough details about the purpose of your
>>>> study that I can be confident of these specifications:  for example,
>>>> whether you are confining your estimates to  particular
>>>> sub-populations.
>>>> I don't agree with Ronan's recommendation of an event-time model.  You
>>>> have cross-sectional prevalence data, not a cohort.  So you would need
>>>>  a "current status"  (or "status quo") model:  the information for
>>>> each individual is their current age and whether or not they have the
>>>> disease of interest;  other words, every individual is right-censored
>>>> or left-censored.  From this information it is possible to reconstruct
>>>> a  survival curve analogous to a current life table.  I'd recommend a
>>>> logistic model, instead.  For such regression analyses, don't use the
>>>> fpc's.
>>>> Steve
>>>> Steven J. Samuels
>>>> sjsamuels@gmail. com
>>>> 18 Cantine's Island
>>>> Saugerties NY 12477
>>>> USA
>>>> Voice: 845-246-0774
>>>> Fax:    206-202-4783
>>>> On Wed, Oct 6, 2010 at 4:59 AM, Ronan Conroy <rconroy@rcsi. ie> wrote:
>>>>> On 6 DFómh 2010, at 07:42, Rajaram Subramanian Potty wrote:
>>>>>> I have data from two rounds of survey conducted in the same areas
>>>>>> (PSUs). But the individual are selected independently in both the
>>>>>> rounds from these areas using the same statistical approaches. What
>>>>>> would be the appropriate analysis that would be carried out to test
>>>>>> the difference in some of the indicators between the two periods. For,
>>>>>> example I want to test the difference in HIV prevalence between the
>>>>>> two rounds. Is it appropriate to use the survey command by considering
>>>>>> the PSUs are the same in both the rounds and setting the survey design
>>>>>> according to our study. After that fitting svy: logistic to examine
>>>>>> the difference in two rounds, is this correct way of testing the
>>>>>> difference between the two rounds. Kindly suggest.
>>>>> My first reaction would be that the most important thing needed here is a
>>>>> sample weighting scheme that allows you to extrapolate from the sample to
>>>>> the underlying population.
>>>>> Are the areas PSUs or strata? In other words, were the areas selected at
>>>>> random or deliberately chosen? This affects your analysis.
>>>>> If you have presumed age of infection, you could consider using an
>>>>> event-time model approach, using age as the time variable. This would allow
>>>>> you to look at the shape of the hazard function. Even if you don't, the
>>>>> hazard curve will show the cumulative prevalence by age (rather than the
>>>>> incidence) but may still be of interest.
>>>>> Ronán Conroy
>>>>> Associate Professor
>>>>> Division of Population Health Sciences
>>>>> =================================
>>>>> rconroy@rcsi. ie
>>>>> Royal College of Surgeons in Ireland
>>>>> Epidemiology Department,
>>>>> Beaux Lane House, Dublin 2, Ireland
>>>>> +353 (0)1 402 2431
>>>>> +353 (0)87 799 97 95
>>>>> +353 (0)1 402 2764 (Fax - remember them?)
>>>>> http://rcsi. academia. edu/RonanConroy
>>>>> P    Before printing, think about the environment
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index