Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Rajaram Subramanian Potty <rajara999@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st:appropriate test |

Date |
Wed, 13 Oct 2010 10:37:58 +0530 |

Dear Samuels, Thank you very much for your suggestions. I have the manuals other than the survey, that too for lower version stata. But the help in the svyset suggests FPC is the following: fpc(varname) requests a finite population correction for the variance estimates. If varname has values less than or equal to 1, it is interpreted as a stratum sampling rate f_h = n_h/N_h, where n_h = number of units sampled from stratum h and N_h = total number of units in the population belonging to stratum h. If varname has values greater than or equal to n_h, it is interpreted as containing N_h. It is an error for varname to have values between 1 and n_h or to have a mixture of sampling rates and stratus sizes. So, I may take the sampling rate which is mentioned above. Thanks and regards, RAJARAM. S On Tue, Oct 12, 2010 at 8:43 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > Rajaram responded to me privately with the following: > >>I just want to know why we should consider round as a second stage >>strata as per your suggestion of survey setting. >> >> >>I would like to inform you that the sample frame used in the second >>stage is not households, it is the number of adult members in the >>households enumerated in the Census. We have listed all the members in >>the household and we used this list of members to select the >>respondents. >> >>In the data set the finite population correction (FPC) was not >>included. So, I just want to know how I should calculate the FPC at >>the first stage. Please, inform. >> >>I would like to inform you that we may first do a descriptive analysis >>and then we may like to do a multivariate analysis. We also want to >>know the OR, so we are doing the logistic regression as discussed in >>my earlier mail. >> > > By definition, strata are groups from which samples are drawn > independently. For the PSUs that was true of the urban/rural place > strata. Within existing PSUs, independent samples of adults were taken > in each round, Therefor stratification is by round is at that stage. > This specification _might_ reduce standard errors somewhat. > > The fpc will be one of two numbers: the number of villages in the > rural stratum in the district or the number of urban blocks in the > urban stratum. Those numbers should be available from the Census that > was used to plan the survey. > > > Steve > > Steven J. Samuels > sjsamuels@gmail.com > 18 Cantine's Island > Saugerties NY 12477 > USA > Voice: 845-246-0774 > Fax: 206-202-4783 > > > On Fri, Oct 8, 2010 at 2:26 PM, Steve Samuels <sjsamuels@gmail.com> wrote: >> -- >> >> Rajaram Subramanian Potty >> >> I recommend that you add all the sampling stages to your design. >> Include fpcs, especially in the first stage, because you need all the >> help that you can get in reducing standard errors. >> >> something like: >> svyset psu [pweight=], strata(place) fpc() || _n, strata(round) fpc() >> >> One thing is unclear: the sampling frame you used to select males and >> females. If your sampling frame consisted of households, for example, >> then replace "_n" in the -svyset- statement above with the household >> id variable. >> >> Which analysis? >> >> As you describe your analysis, it is descriptive (or "enumerative"): >> you want to estimate prevalence rates in one district in 2003 and 200, >> and their difference. >> >> For a descriptive analysis, significance testing is inappropriate. >> Why? If you had tested every adult in the district, you would never >> expect the 2002 and 2008 prevalence rates to be _exactly_ the same. >> (WG Cochran, (1977). Sampling techniques (3rd ed.). New York: Wiley., >> p.39; WE Deming. (1966). Some theory of sampling. New York: Dover >> Publications, Chapter 7, p 247, "Distinction between enumerative and >> analytic studies"). >> >> (There are descriptive studies where hypothesis testing is important, >> e.g. quality assurance sampling ( P Levy and S Lemeshow, Sampling of >> Populations, Wiley, 2008; p. 429), but your study doesn't seem to be >> one of them. ) >> >> The question is therefore not "Are rates in the two years different?", >> but "How different are ?" Confidence intervals provide the answer. >> From a public health point of view, I consider 95% confidence to be >> too stringent. I'd recommend 90% or even 80%. >> >> -svy tab- will provide a direct answer to the question: " What are >> the rates, and how different are they." I don't find the odds ratios >> from -svy: logistic- to be informative unless transformed to rate >> differences; -svy: tab- is based on the logit transform, and does it >> for you. >> >> One other poihnt: If you took equal numbers of people in each village >> and equal numbers in each urban block, your sample should be >> self-weighting, and your weighted prevalence rates and observed rates >> should be very similar. If so, it would simplify your tables to report >> the observed numerators, denominators, and rates, with the CIs from >> the weighted analysis. >> >> --Steve >> >> Steven J. Samuels >> sjsamuels@gmail. com >> 18 Cantine's Island >> Saugerties NY 12477 >> USA >> Voice: 845-246-0774 >> Fax: 206-202-4783 >> >> On Fri, Oct 8, 2010 at 1:25 AM, Rajaram Subramanian Potty >> <rajara999@gmail. com> wrote: >>> I appologise that I did not give much information. In the year 2002, >>> there is a cross-sectional study conducted to estimate the STI >>> prevalence in one of the districts. We have two stratums rural and >>> urban. From the rural areas, 10 villages (PSUs) were selected >>> systematically using PPS. 20 urban blcocks (PSUs) were selected from >>> the list of urban blocks in the district using systematic selection. >>> We have conducted a compelte census in this selcted areas and prepared >>> a sampling frame for selecting the adult males and feamles aged 15-49. >>> The targeted samples of around 6600 were selected from this sample >>> frame. We have calculated the sample weights. >>> >>> Again in the year 2008, we have repeated the survey in the same areas. >>> Conducted the census and selected the required number of 6600 adult >>> males and females in the same way as selected in the year, 2002. So, >>> the respondents selected are independent and it is not a follow-up >>> study. >>> >>> We wanted to test, over all whether the difference in STI prevalence >>> between the year 2002 and 2008 is signficant or not. Also want to >>> examine the difference in some particular groups such as place of >>> residence (rural/Urban), sex, age etc. We are not interested in the >>> difference in the prevalence by PSUs, >>> >>> Presently I am using the simple sytax of survey setting: >>> >>> svyset psu [pweight=wt], strata(place) >>> >>> svy: logistic syphilis round >>> >>> The variable round indicates whether the survey is in the year 2002 or >>> 2008 and p-value from the logistic regression is used for checking >>> whether there is any significant difference. >>> >>> Thanks and regards, >>> >>> RAJARAM. S >>> >>> >>> On Thu, Oct 7, 2010 at 8:48 PM, Steve Samuels <sjsamuels@gmail. com> wrote: >>>> I agree with Ronan that more information is necessary: are you >>>> interested in estimating rates and changes just for the sampled PSUs, >>>> or for the population from which they are sampled? If you are >>>> interested in rates just for those PSUs, then create a combo PSU-round >>>> stratum variable, e.g. with: >>>> >>>> ********* >>>> egen cstratum = group(area round) >>>> ******** >>>> >>>> Then -svyset- a psu variable equal to the second stage sampling unit >>>> (ssu2) in the survey: >>>> >>>> **************** >>>> svyset ssu2 [pweight= ], strata(cstratum).. >>>> **************** >>>> >>>> If you want to estimate for the population from which the areas were sampled: >>>> >>>> ****************** >>>> svyset area [pweight=], strata(original stratum) || ssu2, strata(round) >>>> ******************** >>>> >>>> For descriptive estimates of prevalence rates and their differences, I >>>> recommend -svy: tab-, which uses a logit transformation for >>>> proportions to avoids CIs that extend below zero. You can add finite >>>> population corrections if these would make a difference. >>>> ************************************ >>>> webuse nhanes2 >>>> svy: tab sex diabetes, row ci se llwald >>>> matrix list e(b) >>>> lincom _b[p22] - _b[p12] >>>> ************************************* >>>> >>>> But you have not given us enough details about the purpose of your >>>> study that I can be confident of these specifications: for example, >>>> whether you are confining your estimates to particular >>>> sub-populations. >>>> >>>> I don't agree with Ronan's recommendation of an event-time model. You >>>> have cross-sectional prevalence data, not a cohort. So you would need >>>> a "current status" (or "status quo") model: the information for >>>> each individual is their current age and whether or not they have the >>>> disease of interest; other words, every individual is right-censored >>>> or left-censored. From this information it is possible to reconstruct >>>> a survival curve analogous to a current life table. I'd recommend a >>>> logistic model, instead. For such regression analyses, don't use the >>>> fpc's. >>>> >>>> Steve >>>> >>>> Steven J. Samuels >>>> sjsamuels@gmail. com >>>> 18 Cantine's Island >>>> Saugerties NY 12477 >>>> USA >>>> Voice: 845-246-0774 >>>> Fax: 206-202-4783 >>>> >>>> >>>> On Wed, Oct 6, 2010 at 4:59 AM, Ronan Conroy <rconroy@rcsi. ie> wrote: >>>>> On 6 DFómh 2010, at 07:42, Rajaram Subramanian Potty wrote: >>>>> >>>>>> I have data from two rounds of survey conducted in the same areas >>>>>> (PSUs). But the individual are selected independently in both the >>>>>> rounds from these areas using the same statistical approaches. What >>>>>> would be the appropriate analysis that would be carried out to test >>>>>> the difference in some of the indicators between the two periods. For, >>>>>> example I want to test the difference in HIV prevalence between the >>>>>> two rounds. Is it appropriate to use the survey command by considering >>>>>> the PSUs are the same in both the rounds and setting the survey design >>>>>> according to our study. After that fitting svy: logistic to examine >>>>>> the difference in two rounds, is this correct way of testing the >>>>>> difference between the two rounds. Kindly suggest. >>>>> >>>>> >>>>> My first reaction would be that the most important thing needed here is a >>>>> sample weighting scheme that allows you to extrapolate from the sample to >>>>> the underlying population. >>>>> >>>>> Are the areas PSUs or strata? In other words, were the areas selected at >>>>> random or deliberately chosen? This affects your analysis. >>>>> >>>>> If you have presumed age of infection, you could consider using an >>>>> event-time model approach, using age as the time variable. This would allow >>>>> you to look at the shape of the hazard function. Even if you don't, the >>>>> hazard curve will show the cumulative prevalence by age (rather than the >>>>> incidence) but may still be of interest. >>>>> >>>>> >>>>> >>>>> >>>>> Ronán Conroy >>>>> Associate Professor >>>>> Division of Population Health Sciences >>>>> ================================= >>>>> >>>>> rconroy@rcsi. ie >>>>> Royal College of Surgeons in Ireland >>>>> Epidemiology Department, >>>>> Beaux Lane House, Dublin 2, Ireland >>>>> +353 (0)1 402 2431 >>>>> +353 (0)87 799 97 95 >>>>> +353 (0)1 402 2764 (Fax - remember them?) >>>>> http://rcsi. academia. edu/RonanConroy >>>>> >>>>> P Before printing, think about the environment >>>> >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st:appropriate test***From:*Rajaram Subramanian Potty <rajara999@gmail.com>

**References**:**st:appropriate test***From:*Rajaram Subramanian Potty <rajara999@gmail.com>

**Re: st:appropriate test***From:*Ronan Conroy <rconroy@rcsi.ie>

**Re: st:appropriate test***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st:appropriate test***From:*Rajaram Subramanian Potty <rajara999@gmail.com>

**Re: st:appropriate test***From:*Steve Samuels <sjsamuels@gmail.com>

**Re: st:appropriate test***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: st: problem with generated regressands and WLS** - Next by Date:
**Re: st:appropriate test** - Previous by thread:
**Re: st:appropriate test** - Next by thread:
**Re: st:appropriate test** - Index(es):