From
Austin Nichols <austinnichols@gmail.com>

To
statalist@hsphsun2.harvard.edu

Subject: Re: st: Dealing with survey data when the entire population is also in the dataset

Date: Sat, 25 Jul 2009 23:56:46 -0400

Margo Schlanger<margo.schlanger@gmail.com> : I think Michael I. Lichter means for you to -append- your sample and population in step 2 below. Then you can run -hotelling- or the equivalent linear discriminant model (with robust SEs) to compare means for a bunch of variables observed in both. I.e. . reg sample x* [pw=wt] in step 2b, not tabulate, with or without svy: and chi2. On Fri, Jul 24, 2009 at 11:24 PM, Michael I. Lichter<MLichter@buffalo.edu> wrote: > Margo, > > 1. select your sample and save it in a new dataset, and then in the new > dataset: > a. define your stratum variable -stratavar- as you described > b. define your pweight as you described, wt = 1/(sampling fraction) for each > stratum > 2. combine the full original dataset with the new one, but with stratavar = > 1 for the new dataset and wt = 1 and with a new variable sample = 0 for the > original and =1 for the sample, and then > a. -svyset [pw=wt], strata(stratavar)- > b. do your chi square test or whatever using svy commands, e.g., -svy: tab > var1 sample- > > Michael > > Margo Schlanger wrote: >> >> Hi -- >> >> I have a dataset in which the observation is a "case". I started with >> a complete census of the ~4000 relevant cases; each of them gets a >> line in my dataset. I have data filling a few variables about each of >> them. (When they were filed, where they were filed, the type of >> outcome, etc.) >> >> I randomly sampled them using 3 strata (for one strata, the sampling >> probability was 1, for another about .5, and for a third, about .75). >> I end up with a sample of about 2000. I know much more about this >> sample. >> >> Ok, my question: >> >> 1) How do I use the svyset command to describe this dataset? It would >> be easy if I just dropped all the non-sampled observations, but I >> don't want to do that, because of question 2: >> >> 2) How do I compare something about the sample to the entire >> population, just to demonstrate that my sample isn't very different >> from that entire population on any of the few variables I actually >> have comprehensive data about. I could do this simply, if I didn't >> have to worry about weighting: >> >> tabulate year sample, chi2 >> >> But I need the weights. In addition, I can't simply use weighting >> commands, because in the population (when sample == 0), everything >> should be weighted the same; the weights apply only to my sample (when >> sample == 1). And I can't (so far) use survey commands, because I >> don't know the answer to (1), above. >> >> NOTE: Nearly all the variables I care about are categorical: year of >> filing, type of case. But it's easy enough to turn them into dummies, >> if that's useful. >> >> >> Thanks for any help with this. >> >> Margo Schlanger >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

