[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Dealing with survey data when the entire population is also in the dataset

From   "Michael I. Lichter" <MLichter@Buffalo.EDU>
Subject   Re: st: Dealing with survey data when the entire population is also in the dataset
Date   Fri, 24 Jul 2009 23:24:10 -0400


1. select your sample and save it in a new dataset, and then in the new dataset:
a. define your stratum variable -stratavar- as you described
b. define your pweight as you described, wt = 1/(sampling fraction) for each stratum 2. combine the full original dataset with the new one, but with stratavar = 1 for the new dataset and wt = 1 and with a new variable sample = 0 for the original and =1 for the sample, and then
a. -svyset [pw=wt], strata(stratavar)-
b. do your chi square test or whatever using svy commands, e.g., -svy: tab var1 sample-


Margo Schlanger wrote:
Hi --

I have a dataset in which the observation is a "case".  I started with
a complete census of the ~4000 relevant cases; each of them gets a
line in my dataset.  I have data filling a few variables about each of
them.  (When they were filed, where they were filed, the type of
outcome, etc.)

I randomly sampled them using 3 strata (for one strata, the sampling
probability was 1, for another about .5, and for a third, about .75).
I end up with a sample of about 2000.  I know much more about this

Ok, my question:

1) How do I use the svyset command to describe this dataset?  It would
be easy if I just dropped all the non-sampled observations, but I
don't want to do that, because of question 2:

2) How do I compare something about the sample to the entire
population, just to demonstrate that my sample isn't very different
from that entire population on any of the few variables I actually
have comprehensive data about. I could do this simply, if I didn't
have to worry about weighting:

tabulate year sample, chi2

But I need the weights.  In addition, I can't simply use weighting
commands, because in the population (when sample == 0), everything
should be weighted the same; the weights apply only to my sample (when
sample == 1).  And I can't (so far) use survey commands, because I
don't know the answer to (1), above.

NOTE: Nearly all the variables I care about are categorical:  year of
filing, type of case.  But it's easy enough to turn them into dummies,
if that's useful.

Thanks for any help with this.

Margo Schlanger

Professor of Law
University of Michigan Law School
Director, Civil Rights Litigation Clearinghouse

*   For searches and help try:

Michael I. Lichter, Ph.D. <>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index