[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Question about svyset command |

Date |
Thu, 19 Feb 2009 17:50:01 -0500 |

--

-Steve

On Feb 19, 2009, at 3:43 PM, Michael I. Lichter wrote:

I agree with Stas about the vital importance of defining the targetpopulation.Steven, however, is making me more confused about the differencebetween inferences about finite populations vs. those aboutsuperpopulations. I'll use my own study as an example.I'm analyzing results from a survey of physicians regarding healthinformation technology (HIT) adoption. The survey was stratifiedand a couple of the strata had large sampling fractions (like 1/3and 1/8). My target population is all primary care physiciansdelivering patient care during a specific interval in time--and theinterval in time is meaningful, because I expect HIT adoptionlevels to be different (higher) today than they were back when thedata were collected. The target population and the (list) framepopulation are undoubtedly different for a variety of reasons,including the inherent near-impossibility of maintaining a completeand accurate list of any large population. Still, I think I'minterested in a finite population of actual physicians practicingat a specific point in time, not a theoretical, infinitesuperpopulation. Am I right?I want to know about (a) current adoption rates by stratum(estimating proportion & variance), (b) differences in adoptionrates across strata at this particular point in time (e.g., usingchi-square), (c) the general relationship between variouspredictors (or covariates) and adoption (e.g., using logisticregression). Are the first two finite population objectives and thelast a superpopulation-related objective, so that variances shouldbe estimated one way for the first two and a different way for thethird?Thanks. Michael Steven Samuels wrote:--Thomas could generalize to the entire US in 2005. According tohttp://www.icpsr.umich.edu/cocoon/NACJD/STUDY/23862.xml he isomitting from his data 45 strata that covered the rest of thecountry.I actually agree with Stas. I do think that there are uses forregression and comparisons with fpc's in descriptive studies. Ionce analyzed Behavioral Risk Factor Surveillance System (BRFSS)data in California, and characterized historical changes smokingprevalence with a regression line. It fit pretty well.I also would favor logistic regression and log-linear modeling assmoothing techniques to economically describe a population.Confidence intervals (with fpc's) for differences between twoproportions can also be informative; one might want to know "Howdifferent were the proportions in that population at that time?".In my experience, though, most investigators who do regressions donot intend their analyses to be descriptive only. Until Thomastells us the purpose of his study, we will not really know what toadvise.-Steve On Feb 19, 2009, at 1:24 PM, Stas Kolenikov wrote:Adding to the previous comments: In all likelihood, your results are only generalizable to those mostpopulous counties, as they are probably large metropolitan areas.Youwould need to think very carefully about what the population is towhich the results are generalizable. Your superpopulation, if youcanthink of one, would be all potential trials in these and similarlargecounties. I would imagine that in a 3000 people county in Idaho, people won't be suing each other as furiously as somewhere in New Jersey or California, as there is plenty of land to live on... but that's something for you to clarify.Hence, just like Michael, I would disagree with Steven aboutignoringfpc so happily. They would affect your standard errors, correctlyshowing that you got more than half of your total finiepopulation. Ifyou had all of your population, you would have a census logistic regression, which would be just some sort of the line saying where your 0s and 1s are. Now, if you had a census regression, what would standard errors stand for? On one hand, you've got all possible observations, so there is no uncertainty left -- the sampling/randomization/design variance is zero. But if you arethinking about the social process that has created thoseobservations(trials), then you can still think about model variances that should be on the scale of 1/N -- and to get these, you would need to ignore fpc. Your design specification thus depends on which variance you want toestimate. With census regression, your are saying, "There is aline ofbest fit, and I am prepared to find out it does not fit the dataperfectly, but if my goal is to get as close to that line of bestfitas possible, then my sample logistic regression is the answer". That line of best fit is a well defined population concept; whether it makes a substantive sense or not -- that's certainly open tointerpretation. With a superpopulation model, you are saying, "Iknowperfectly well that these and only these factors affect the probability of observing that post-trial motion, and they enter the logistic equation linearly, and all that." Your results will only be as good as your model, and you are putting a lot of trust in correct specification there. On Wed, Feb 18, 2009 at 11:04 PM, <thomashcohen@aol.com> wrote:Iâm a beginner Stata user and have a question about the svysetcommand inStata that I hope someone can help me with.For some background, I'm engaged in a logistic regression modelthatexamines the likelihood of either a plaintiff or defendantfiling a posttrial motion. The database I'm working with is the Civil JusticeSurvey ofState Courts (CJSSC). The CJSSC provides case level data for alltort,contract, and real property trials conclude in a sample of 46 ofthenation's 75 most populous counties in 2005. Data are collectedon about8,000 trials in these 46 counties which are weighted torepresent about10,500 trials concluded in the nation's 75 most populouscounties. Iunderstand that one of the nice features of Stata is that itallows you totake into account the sampling structure of a dataset when doinglogisticregression modeling. Here is the Stata code that I used to takein accountthe sampling structure of these civil trial data:svyset sitecode [pweight=bwgt0], strata(strata) fpc(fpc1) ||su2, fpc(fpc2)Where Sitecode = County where the civil trial took placeBwgt0 = Weights to weight the data from 46 to the 75 mostpopulous countiesStrata = Strata where the counties are located. The dataset has5 stratafpc1 = The probability of a county appearing in the sample. Forexample, acounty with a weight of 2 would have a 50% probability ofappearing in thesampl esu2 = Unique identifier that identifies the trials that occurredin each ofthe 46 countiesFpc2 = 1 for all 8,000 trials disposed in the 46 counties. Igave fpc2 avalue of 1 because I wanted to tell Stata that the trials had a100%probability of showing up in these 46 counties.I think that I got the part of this programming that deals withthe firstlevel of the sample design correct. It's the second level thatI'm havingsome problems with At the second level of the sample design, I'mtrying tocorrect for the fact that I have data for every civil trialconcluded in the46 counties. Basically, I want to tell Stata that part of thissample isactually a census of all trials concluded in the 46 counties in2005. Iunderstand Stata has a finite population correction command thattakes intoaccount the census like format of these data. The logisticregressionresults were the same irrespective of whether I used the 1st or2nd stagesin the sample design. I think this is telling me that Stata is notcorrecting for the census like aspect of this sample. Can anyonegive mesome guidance as to whether I'm correctly taking into accountthe samplingstructure of these data. In particular, I would like to knowwhether I'musing the fpc2 factor correctly. Any assistance you could giveon thismatter would be very much appreciated. Thanks Thomas Cohen * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/-- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/-- Michael I. Lichter, Ph.D. Research Assistant Professor & NRSA Fellow UB Department of Family Medicine / Primary Care Research Institute UB Clinical Center, 462 Grider Street, Buffalo, NY 14215 Office: CC 125 / Phone: 716-898-4751 / E-Mail: mlichter@buffalo.edu * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Question about svyset command***From:*thomashcohen@aol.com

**Re: st: Question about svyset command***From:*Stas Kolenikov <skolenik@gmail.com>

**Re: st: Question about svyset command***From:*Steven Samuels <sjhsamuels@earthlink.net>

**Re: st: Question about svyset command***From:*"Michael I. Lichter" <mlichter@buffalo.edu>

- Prev by Date:
**st: sargan test and hansen j statistic- xtabond2** - Next by Date:
**st: Quantile regression with stata** - Previous by thread:
**Re: st: Question about svyset command** - Next by thread:
**st: pgmhaz/hshaz output, why does it look like this?** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |