[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Question about svyset command |

Date |
Thu, 19 Feb 2009 12:24:27 -0600 |

Adding to the previous comments: In all likelihood, your results are only generalizable to those most populous counties, as they are probably large metropolitan areas. You would need to think very carefully about what the population is to which the results are generalizable. Your superpopulation, if you can think of one, would be all potential trials in these and similar large counties. I would imagine that in a 3000 people county in Idaho, people won't be suing each other as furiously as somewhere in New Jersey or California, as there is plenty of land to live on... but that's something for you to clarify. Hence, just like Michael, I would disagree with Steven about ignoring fpc so happily. They would affect your standard errors, correctly showing that you got more than half of your total finie population. If you had all of your population, you would have a census logistic regression, which would be just some sort of the line saying where your 0s and 1s are. Now, if you had a census regression, what would standard errors stand for? On one hand, you've got all possible observations, so there is no uncertainty left -- the sampling/randomization/design variance is zero. But if you are thinking about the social process that has created those observations (trials), then you can still think about model variances that should be on the scale of 1/N -- and to get these, you would need to ignore fpc. Your design specification thus depends on which variance you want to estimate. With census regression, your are saying, "There is a line of best fit, and I am prepared to find out it does not fit the data perfectly, but if my goal is to get as close to that line of best fit as possible, then my sample logistic regression is the answer". That line of best fit is a well defined population concept; whether it makes a substantive sense or not -- that's certainly open to interpretation. With a superpopulation model, you are saying, "I know perfectly well that these and only these factors affect the probability of observing that post-trial motion, and they enter the logistic equation linearly, and all that." Your results will only be as good as your model, and you are putting a lot of trust in correct specification there. On Wed, Feb 18, 2009 at 11:04 PM, <thomashcohen@aol.com> wrote: > Iâm a beginner Stata user and have a question about the svyset command in > Stata that I hope someone can help me with. > > For some background, I'm engaged in a logistic regression model that > examines the likelihood of either a plaintiff or defendant filing a post > trial motion. The database I'm working with is the Civil Justice Survey of > State Courts (CJSSC). The CJSSC provides case level data for all tort, > contract, and real property trials conclude in a sample of 46 of the > nation's 75 most populous counties in 2005. Data are collected on about > 8,000 trials in these 46 counties which are weighted to represent about > 10,500 trials concluded in the nation's 75 most populous counties. I > understand that one of the nice features of Stata is that it allows you to > take into account the sampling structure of a dataset when doing logistic > regression modeling. Here is the Stata code that I used to take in account > the sampling structure of these civil trial data: > > svyset sitecode [pweight=bwgt0], strata(strata) fpc(fpc1) || su2, fpc(fpc2) > > Where > Sitecode = County where the civil trial took place > Bwgt0 = Weights to weight the data from 46 to the 75 most populous counties > Strata = Strata where the counties are located. The dataset has 5 strata > fpc1 = The probability of a county appearing in the sample. For example, a > county with a weight of 2 would have a 50% probability of appearing in the > sampl > e > su2 = Unique identifier that identifies the trials that occurred in each of > the 46 counties > Fpc2 = 1 for all 8,000 trials disposed in the 46 counties. I gave fpc2 a > value of 1 because I wanted to tell Stata that the trials had a 100% > probability of showing up in these 46 counties. > I think that I got the part of this programming that deals with the first > level of the sample design correct. It's the second level that I'm having > some problems with At the second level of the sample design, I'm trying to > correct for the fact that I have data for every civil trial concluded in the > 46 counties. Basically, I want to tell Stata that part of this sample is > actually a census of all trials concluded in the 46 counties in 2005. I > understand Stata has a finite population correction command that takes into > account the census like format of these data. The logistic regression > results were the same irrespective of whether I used the 1st or 2nd stages > in the sample design. I think this is telling me that Stata is not > correcting for the census like aspect of this sample. Can anyone give me > some guidance as to whether I'm correctly taking into account the sampling > structure of these data. In particular, I would like to know whether I'm > using the fpc2 factor correctly. Any assistance you could give on this > matter would be very much appreciated. > Thanks > Thomas Cohen > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Question about svyset command***From:*Steven Samuels <sjhsamuels@earthlink.net>

**References**:**st: Question about svyset command***From:*thomashcohen@aol.com

- Prev by Date:
**st: Jackknifing on Stata** - Next by Date:
**Re: st: AW: compare files, vars only** - Previous by thread:
**Re: st: Question about svyset command** - Next by thread:
**Re: st: Question about svyset command** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |