[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Question about svyset command

From	"Michael I. Lichter" <[email protected]>
To	[email protected]
Subject	Re: st: Question about svyset command
Date	Thu, 19 Feb 2009 12:38:19 -0500

Thomas,

You are dealing with two misconceptions. The first is one that Stevendidn't mention, and the second is one that Steven mentioned but did notrelate as directly to your situation as he could have.

1. A multi-stage sampling design is a design in which sampling takesplace multiple times. E.g., if you sample 20% of the districts in astate, sample 10% of the schools in each district, that's a two-stagesample. Let's say we conduct interviews of all of the students in all ofthe classrooms within each selected school. Children are nested withinclassrooms within schools within districts--that's four levels ofnesting. But the design is still two-stage, despite the four levels ofnesting.

*You* have a *single-stage* stratified cluster design. The fact thatcases are nested within the clusters you select is what makes it acluster design. Your telling Stata that the cases were selected at the"second stage" with 100% probability is the same as telling Stata thatthere is no second stage. That's why the estimates look the same whetheryou tell Stata about the "second stage" or not.

2. Also, Steven is definitely correct if your goal is not to generalizeto the 10,500 trials you mentioned, but rather to this type of trial ingeneral, wherever and whenever it takes place. In fact, what he says ismore obviously applicable to your study then it is to most samplesurveys of people. If the 10,500 trials are taken to be representativeof some larger population of trials, then you are dealing withsuperpopulation parameters, like he said. From my limited reading,however, I think that the consensus on this topic among statisticians isless widespread than he suggests, and the consensus about what to doabout it is almost nonexistent (except for what he said about the FPC).Korn and Graubard (1999) say, for example, that aside from ignoring theFPC there is no agreement on what to do in order to reduce the bias inestimates of superpopulation parameters from complex sample designs (p.228).

On the other hand, if what you really want to know about is those 10,500trials because there is something special about this specific populationof trials then Steven is wrong and you should use the FPC. With regardto the value of FPC1, it should either be the size of the stratum -- andyou didn't say anything about what your strata were -- or the number ofcases selected from the stratum divided by the size of the stratum. Isuggest the former because you're less likely to make an error in itscalculation.


Michael

Steven Samuels wrote:

Thomas,
1. The finite population corrections should affect only standarderrors and confidence intervals, not estimates of means, proportions,or confidence intervals.
2. fpc's should be employed only for descriptive analyses(proportions, means). These analyses describe the specific finitepopulation that you sampled: tort, contract, and real property trialsin the 75 counties.
If the purpose of your model is analystic: to develop predictions,estimate odds ratios, compare proportions, or otherwise testhypotheses, you should *omit* the finite population corrections. Thereasoning is interesting (Cochran, 1977, p.39): It is seldom ofscientific interest to ask if a null hypothesis (e.g. that twoproportions are equal) is exactly true in a finite population . Exceptby a very rare chance, a null hypothesis will never be true. You woulddiscover this by enumerating the entire population. This leads to theadoption of a "superpopulation" viewpoint, which is taken by almostall statisticians these days. See also Deming(1966) pp 247-261"Distinction between enumerative and analystic studies"; Korn andGraubard (1999), p. 227.
In other words, you should use one -svyset- for describing the targetpopulation and another for the logistic regression.
Two questions came to mind:
1. If a trial had >1 plaintiff or >1 defendant, would that notincrease the probability of a post trial motion? How are you going toaccount for that?2. For descriptive analyses, counties selected with certainty needspecial treatment. Look up the "singleunit" option for -svyset-.
Good luck!

-Steve

References
Cochran, W. G. (1977). Sampling techniques (3ded.). New York: Wiley.
Deming, W. E. (1966). Some theory of sampling. New York: DoverPublications.Korn, E. L., & Graubard, B. I. (1999). Analysis of health surveys(Wiley series in probability and statistics). New York: Wiley.
On Feb 19, 2009, at 12:04 AM, [email protected] wrote:
Iâm a beginner Stata user and have a question about the svysetcommand in Stata that I hope someone can help me with.
For some background, I'm engaged in a logistic regression model thatexamines the likelihood of either a plaintiff or defendant filing apost trial motion. The database I'm working with is the Civil JusticeSurvey of State Courts (CJSSC). The CJSSC provides case level datafor all t conclude in a sample of 46 of the nation's 75 most populouscounties in 2005. Data are collected on about 8,000 trials in these46 counties which are weighted to represent about 10,500 trialsconcluded in the nation's 75 most populous counties. I understandthat one of the nice features of Stata is that it allows you to takeinto account the sampling structure of a dataset when doing logisticregression modeling. Here is the Stata code that I used to take inaccount the sampling structure of these civil trial data:
svyset sitecode [pweight=bwgt0], strata(strata) fpc(fpc1) || su2,fpc(fpc2)
Where
Sitecode = County where the civil trial took place
Bwgt0 = Weights to weight the data from 46 to the 75 most populouscounties
Strata = Strata where the counties are located. The dataset has 5 strata
fpc1 = The probability of a county appearing in the sample. Forexample, a county with a weight of 2 would have a 50% probability ofappearing in the sampl
e
su2 = Unique identifier that identifies the trials that occurred ineach of the 46 countiesFpc2 = 1 for all 8,000 trials disposed in the 46 counties. I gavefpc2 a value of 1 because I wanted to tell Stata that the trials hada 100% probability of showing up in these 46 counties.I think that I got the part of this programming that deals with thefirst level of the sample design correct. Itâ€™s the second levelthat Iâ€™m having some problems with At the second level of thesample design, I'm trying to correct for the fact that I have datafor every civil trial concluded in the 46 counties. Basically, I wantto tell Stata that part of this sample is actually a census of alltrials concluded in the 46 counties in 2005. I understand Stata has afinite population correction command that takes into account thecensus like format of these data. The logistic regression resultswere the same irrespective of whether I used the 1st or 2nd stages inthe sample design. I think this is telling me that Stata is notcorrecting for the census like aspect of this sample. Can anyone giveme some guidance as to whether I'm correctly taking into account thesampling structure of these data. In particular, I would like to knowwhether I'm using the fpc2 factor correctly. Any assistance you couldgive on this matter would be very much appreciated.
Thanks
Thomas Cohen


*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/


--
Michael I. Lichter, Ph.D.
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 125 / Phone: 716-898-4751 / E-Mail: [email protected]

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Question about svyset command
  - From: [email protected]
- Re: st: Question about svyset command
  - From: Steven Samuels <[email protected]>

Prev by Date: Re: st: DiD with panel data
Next by Date: Re: st: AW: compare files, vars only
Previous by thread: Re: st: Question about svyset command
Next by thread: Re: st: Question about svyset command
Index(es):
- Date
- Thread