Re: st: Dealing with survey data when the entire population is also in the dataset

Sun, 26 Jul 2009 18:25:38 -0400

----- sysuse auto,clear sample 50 if (foreign == 0) sample 75 if (foreign == 1) replace wt = 1/.75 if (foreign == 1) replace wt = 1/.5 if (foreign == 0) gen sample = 1 gen stratum = foreign tempfile sample save `sample' sysuse auto,clear append using `sample' replace wt = 1 if missing(sample) replace stratum = 2 if missing(sample) replace sample = 0 if missing(sample) svyset [pw=wt], strata(stratum) ---- Austin Nichols wrote:

Margo Schlanger<margo.schlanger@gmail.com> : I think Michael I. Lichter means for you to -append- your sample and population in step 2 below. Then you can run -hotelling- or the equivalent linear discriminant model (with robust SEs) to compare means for a bunch of variables observed in both. I.e. . reg sample x* [pw=wt] in step 2b, not tabulate, with or without svy: and chi2. On Fri, Jul 24, 2009 at 11:24 PM, Michael I. Lichter<MLichter@buffalo.edu> wrote:Margo, 1. select your sample and save it in a new dataset, and then in the new dataset: a. define your stratum variable -stratavar- as you described b. define your pweight as you described, wt = 1/(sampling fraction) for each stratum 2. combine the full original dataset with the new one, but with stratavar = 1 for the new dataset and wt = 1 and with a new variable sample = 0 for the original and =1 for the sample, and then a. -svyset [pw=wt], strata(stratavar)- b. do your chi square test or whatever using svy commands, e.g., -svy: tab var1 sample- Michael Margo Schlanger wrote:Hi -- I have a dataset in which the observation is a "case". I started with a complete census of the ~4000 relevant cases; each of them gets a line in my dataset. I have data filling a few variables about each of them. (When they were filed, where they were filed, the type of outcome, etc.) I randomly sampled them using 3 strata (for one strata, the sampling probability was 1, for another about .5, and for a third, about .75). I end up with a sample of about 2000. I know much more about this sample. Ok, my question: 1) How do I use the svyset command to describe this dataset? It would be easy if I just dropped all the non-sampled observations, but I don't want to do that, because of question 2: 2) How do I compare something about the sample to the entire population, just to demonstrate that my sample isn't very different from that entire population on any of the few variables I actually have comprehensive data about. I could do this simply, if I didn't have to worry about weighting: tabulate year sample, chi2 But I need the weights. In addition, I can't simply use weighting commands, because in the population (when sample == 0), everything should be weighted the same; the weights apply only to my sample (when sample == 1). And I can't (so far) use survey commands, because I don't know the answer to (1), above. NOTE: Nearly all the variables I care about are categorical: year of filing, type of case. But it's easy enough to turn them into dummies, if that's useful. Thanks for any help with this. Margo Schlanger* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

-- Michael I. Lichter, Ph.D. <mlichter@buffalo.edu> Research Assistant Professor & NRSA Fellow UB Department of Family Medicine / Primary Care Research Institute UB Clinical Center, 462 Grider Street, Buffalo, NY 14215 Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

