Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st:appropriate test

From	Rajaram Subramanian Potty <[email protected]>
To	[email protected]
Subject	Re: st:appropriate test
Date	Wed, 13 Oct 2010 11:34:35 +0530
Dear Samuels,
Thanks a lot for your suggestions.

Whether I will consider the number of units or the sampling rate (as
indicated in my earlier mail), it gives almost same answer.
My understanding is that the FPC is used if the sample size is more
than 5% of the total population. In our case, the sample size is less
than 2% of the total population.

I have included the FPC as suggested by you, it reduces the standard
error, but I get only a minor reduction in the standard error.
Following is the comparison of results using your suggested survey
setting and the survey set which I used earlier.

svyset npsu [pweight=wt], strata (place) fpc(c2) _n,
strata(round)	fpc(c3)  (suggested by you)

. svy:mean sti, over(round)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       2          Number of obs    =    9089
Number of PSUs   =      36          Population size  =    9089
Design df        =      34

1: round = 1
2: round = 2


Linearized
Over        Mean   Std. Err.     [95% Conf. Interval]

sti
1    .0316072   .0056724      .0200794     .043135
2    .0257605   .0038524      .0179315    .0335896


. svyset npsu [pweight=wt], strata (place)

pweight: wt
VCE: linearized
Single unit: missing
Strata 1: place
SU 1: npsu
FPC 1: <zero>

.
end of do-file

. svy:mean sti, over(round)
(running mean on estimation sample)

Survey: Mean estimation

Number of strata =       2          Number of obs	=    9089
Number of PSUs   =      36          Population size	=    9089
Design df	=      34

1: round = 1
2: round = 2

	
Linearized
Over        Mean   Std. Err.     [95% Conf.	Interval]
	
sti
1    .0316072   .0057466      .0199286	.0432858
2    .0257605   .0038841      .0178671	.033654
	
With regards,

RAM

On Wed, Oct 13, 2010 at 10:37 AM, Rajaram Subramanian Potty
<[email protected]> wrote:
> Dear Samuels,
> Thank you very much for your suggestions. I have the manuals other
> than the survey, that too for lower version stata. But the help in the
> svyset suggests FPC is the following:
> fpc(varname) requests a finite population correction for the variance
>        estimates.  If varname has values less than or equal to 1, it is
>        interpreted as a stratum sampling rate f_h = n_h/N_h, where n_h =
>        number of units sampled from stratum h and N_h = total number of
>        units in the population belonging to stratum h.  If varname has
>        values greater than or equal to n_h, it is interpreted as containing
>        N_h.  It is an error for varname to have values between 1 and n_h or
>        to have a mixture of sampling rates and stratus sizes.
>
> So, I may take the sampling rate which is mentioned above.
>
> Thanks and regards,
>
> RAJARAM. S
>
>
> On Tue, Oct 12, 2010 at 8:43 PM, Steve Samuels <[email protected]> wrote:
>> Rajaram responded to me privately with the following:
>>
>>>I just want to know why we should consider round as a second stage
>>>strata as per your suggestion of survey setting.
>>>
>>>
>>>I would like to inform you that the sample frame used in the second
>>>stage is not households, it is the number of adult members in the
>>>households enumerated in the Census. We have listed all the members in
>>>the household and we used this list of members to select the
>>>respondents.
>>>
>>>In the data set the finite population correction (FPC) was not
>>>included. So, I just want to know how I should calculate the FPC at
>>>the first stage. Please, inform.
>>>
>>>I would like to inform you that we may first do a descriptive analysis
>>>and then we may like to do a multivariate analysis. We also want to
>>>know the OR, so we are doing the logistic regression as discussed in
>>>my earlier mail.
>>>
>>
>> By definition, strata are groups from which samples are drawn
>> independently.  For the PSUs that was true of the urban/rural place
>> strata. Within existing PSUs, independent samples of adults were taken
>> in each  round,  Therefor stratification is by round is at that stage.
>>  This specification _might_  reduce standard errors somewhat.
>>
>> The fpc will be one of two numbers: the number of villages in the
>> rural stratum in the district or the number of urban blocks in the
>> urban stratum.  Those numbers should be available from the Census that
>> was used to plan the survey.
>>
>>
>> Steve
>>
>> Steven J. Samuels
>> [email protected]
>> 18 Cantine's Island
>> Saugerties NY 12477
>> USA
>> Voice: 845-246-0774
>> Fax:    206-202-4783
>>
>>
>> On Fri, Oct 8, 2010 at 2:26 PM, Steve Samuels <[email protected]> wrote:
>>> --
>>>
>>>  Rajaram Subramanian Potty
>>>
>>> I recommend that you add all the sampling stages to your design.
>>> Include fpcs, especially in the first stage, because you need all the
>>> help that you can get in reducing standard errors.
>>>
>>> something like:
>>> svyset psu [pweight=], strata(place) fpc() || _n, strata(round) fpc()
>>>
>>> One thing is unclear: the sampling frame you used to select males and
>>> females. If your sampling frame consisted of households, for example,
>>> then replace "_n" in the -svyset- statement above with the household
>>> id variable.
>>>
>>> Which analysis?
>>>
>>> As you describe your analysis, it is descriptive (or "enumerative"):
>>> you want to estimate prevalence rates in one district in 2003 and 200,
>>> and their difference.
>>>
>>> For a descriptive analysis, significance testing is inappropriate.
>>> Why? If you had  tested every adult in the district, you would never
>>> expect the 2002 and 2008 prevalence rates to be _exactly_ the same.
>>> (WG Cochran, (1977). Sampling techniques (3rd ed.). New York: Wiley.,
>>> p.39; WE Deming. (1966). Some theory of sampling. New York: Dover
>>> Publications, Chapter 7, p 247, "Distinction between enumerative and
>>> analytic studies").
>>>
>>> (There are descriptive studies where hypothesis testing is important,
>>> e.g. quality assurance sampling ( P Levy and S Lemeshow, Sampling of
>>> Populations, Wiley, 2008; p. 429), but your study doesn't seem to be
>>> one of them. )
>>>
>>> The question is therefore not "Are rates in the two years different?",
>>> but "How different are ?" Confidence intervals provide the answer.
>>> From a public health point of view, I consider 95% confidence to be
>>> too stringent. I'd recommend 90% or even 80%.
>>>
>>>  -svy tab- will provide a direct answer to the question: " What are
>>> the rates, and how different are they." I don't find the odds ratios
>>> from -svy: logistic- to be informative unless transformed to rate
>>> differences; -svy: tab- is based on the logit transform, and does it
>>> for you.
>>>
>>> One other poihnt: If you took equal numbers of people in each village
>>> and equal numbers in each urban block, your sample should be
>>> self-weighting, and your weighted prevalence rates and observed rates
>>> should be very similar. If so, it would simplify your tables to report
>>> the observed numerators, denominators, and rates, with the CIs from
>>> the weighted analysis.
>>>
>>> --Steve
>>>
>>> Steven J. Samuels
>>> sjsamuels@gmail. com
>>> 18 Cantine's Island
>>> Saugerties NY 12477
>>> USA
>>> Voice: 845-246-0774
>>> Fax:    206-202-4783
>>>
>>> On Fri, Oct 8, 2010 at 1:25 AM, Rajaram Subramanian Potty
>>> <rajara999@gmail. com> wrote:
>>>> I appologise that I did not give much information. In the year 2002,
>>>> there is a cross-sectional study conducted to estimate the STI
>>>> prevalence in one of the districts. We have two stratums rural and
>>>> urban. From the rural areas, 10 villages (PSUs) were selected
>>>> systematically using PPS. 20 urban blcocks (PSUs) were selected from
>>>> the list of urban blocks in the district using systematic selection.
>>>> We have conducted a compelte census in this selcted areas and prepared
>>>> a sampling frame for selecting the adult males and feamles aged 15-49.
>>>> The targeted samples of around 6600 were selected from this sample
>>>> frame. We have calculated the sample weights.
>>>>
>>>> Again in the year 2008, we have repeated the survey in the same areas.
>>>> Conducted the census and selected the required number of 6600 adult
>>>> males and females in the same way as selected in the year, 2002.  So,
>>>> the respondents selected are independent and it is not a follow-up
>>>> study.
>>>>
>>>> We wanted to test, over all whether the difference in STI prevalence
>>>> between the year 2002 and 2008 is signficant or not.  Also want to
>>>> examine the difference in some particular groups such as place of
>>>> residence (rural/Urban), sex, age etc. We are not interested in the
>>>> difference in the prevalence by PSUs,
>>>>
>>>> Presently I am using the simple sytax of survey setting:
>>>>
>>>> svyset psu [pweight=wt], strata(place)
>>>>
>>>> svy: logistic syphilis round
>>>>
>>>> The variable round indicates whether the survey is in the year 2002 or
>>>> 2008 and p-value from the logistic regression is used for checking
>>>> whether there is any significant difference.
>>>>
>>>> Thanks and regards,
>>>>
>>>> RAJARAM. S
>>>>
>>>>
>>>> On Thu, Oct 7, 2010 at 8:48 PM, Steve Samuels <sjsamuels@gmail. com> wrote:
>>>>> I agree with Ronan that more information is necessary: are you
>>>>> interested in estimating rates and changes just for the sampled PSUs,
>>>>> or for the population from which they are sampled?  If you are
>>>>> interested in rates just for those PSUs, then create a combo PSU-round
>>>>> stratum variable, e.g. with:
>>>>>
>>>>> *********
>>>>> egen cstratum = group(area round)
>>>>> ********
>>>>>
>>>>> Then -svyset-  a psu variable equal to the second stage sampling unit
>>>>> (ssu2) in the survey:
>>>>>
>>>>> ****************
>>>>> svyset ssu2 [pweight= ], strata(cstratum)..
>>>>> ****************
>>>>>
>>>>> If you want to estimate for the population from which the areas were sampled:
>>>>>
>>>>> ******************
>>>>> svyset area [pweight=], strata(original stratum)  || ssu2, strata(round)
>>>>> ********************
>>>>>
>>>>> For descriptive estimates of prevalence rates and their differences, I
>>>>> recommend -svy: tab-, which uses a logit transformation for
>>>>> proportions to avoids CIs that extend below zero.  You can add finite
>>>>> population corrections if these would make a difference.
>>>>> ************************************
>>>>> webuse nhanes2
>>>>> svy: tab sex diabetes, row ci se llwald
>>>>> matrix list e(b)
>>>>> lincom _b[p22] - _b[p12]
>>>>> *************************************
>>>>>
>>>>> But you have not given us enough details about the purpose of your
>>>>> study that I can be confident of these specifications:  for example,
>>>>> whether you are confining your estimates to  particular
>>>>> sub-populations.
>>>>>
>>>>> I don't agree with Ronan's recommendation of an event-time model.  You
>>>>> have cross-sectional prevalence data, not a cohort.  So you would need
>>>>>  a "current status"  (or "status quo") model:  the information for
>>>>> each individual is their current age and whether or not they have the
>>>>> disease of interest;  other words, every individual is right-censored
>>>>> or left-censored.  From this information it is possible to reconstruct
>>>>> a  survival curve analogous to a current life table.  I'd recommend a
>>>>> logistic model, instead.  For such regression analyses, don't use the
>>>>> fpc's.
>>>>>
>>>>> Steve
>>>>>
>>>>> Steven J. Samuels
>>>>> sjsamuels@gmail. com
>>>>> 18 Cantine's Island
>>>>> Saugerties NY 12477
>>>>> USA
>>>>> Voice: 845-246-0774
>>>>> Fax:    206-202-4783
>>>>>
>>>>>
>>>>> On Wed, Oct 6, 2010 at 4:59 AM, Ronan Conroy <rconroy@rcsi. ie> wrote:
>>>>>> On 6 DFómh 2010, at 07:42, Rajaram Subramanian Potty wrote:
>>>>>>
>>>>>>> I have data from two rounds of survey conducted in the same areas
>>>>>>> (PSUs). But the individual are selected independently in both the
>>>>>>> rounds from these areas using the same statistical approaches. What
>>>>>>> would be the appropriate analysis that would be carried out to test
>>>>>>> the difference in some of the indicators between the two periods. For,
>>>>>>> example I want to test the difference in HIV prevalence between the
>>>>>>> two rounds. Is it appropriate to use the survey command by considering
>>>>>>> the PSUs are the same in both the rounds and setting the survey design
>>>>>>> according to our study. After that fitting svy: logistic to examine
>>>>>>> the difference in two rounds, is this correct way of testing the
>>>>>>> difference between the two rounds. Kindly suggest.
>>>>>>
>>>>>>
>>>>>> My first reaction would be that the most important thing needed here is a
>>>>>> sample weighting scheme that allows you to extrapolate from the sample to
>>>>>> the underlying population.
>>>>>>
>>>>>> Are the areas PSUs or strata? In other words, were the areas selected at
>>>>>> random or deliberately chosen? This affects your analysis.
>>>>>>
>>>>>> If you have presumed age of infection, you could consider using an
>>>>>> event-time model approach, using age as the time variable. This would allow
>>>>>> you to look at the shape of the hazard function. Even if you don't, the
>>>>>> hazard curve will show the cumulative prevalence by age (rather than the
>>>>>> incidence) but may still be of interest.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Ronán Conroy
>>>>>> Associate Professor
>>>>>> Division of Population Health Sciences
>>>>>> =================================
>>>>>>
>>>>>> rconroy@rcsi. ie
>>>>>> Royal College of Surgeons in Ireland
>>>>>> Epidemiology Department,
>>>>>> Beaux Lane House, Dublin 2, Ireland
>>>>>> +353 (0)1 402 2431
>>>>>> +353 (0)87 799 97 95
>>>>>> +353 (0)1 402 2764 (Fax - remember them?)
>>>>>> http://rcsi. academia. edu/RonanConroy
>>>>>>
>>>>>> P    Before printing, think about the environment
>>>>>
>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st:appropriate test
  - From: Rajaram Subramanian Potty <[email protected]>
- Re: st:appropriate test
  - From: Ronan Conroy <[email protected]>
- Re: st:appropriate test
  - From: Steve Samuels <[email protected]>
- Re: st:appropriate test
  - From: Rajaram Subramanian Potty <[email protected]>
- Re: st:appropriate test
  - From: Steve Samuels <[email protected]>
- Re: st:appropriate test
  - From: Steve Samuels <[email protected]>
- Re: st:appropriate test
  - From: Rajaram Subramanian Potty <[email protected]>
Prev by Date: Re: st:appropriate test
Next by Date: Re: st: new variables creation with unbalanced panel
Previous by thread: Re: st:appropriate test
Next by thread: st: I get the error-message "file not estimates r(610);" when using - mi estimate -
Index(es):
- Date
- Thread