Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Dealing with survey data when the entire population is also in the dataset


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Dealing with survey data when the entire population is also in the dataset
Date   Sat, 25 Jul 2009 23:56:46 -0400

Margo Schlanger<margo.schlanger@gmail.com> :
I think Michael I. Lichter means for you to -append- your sample and
population in step 2 below.  Then you can run -hotelling- or the
equivalent linear discriminant model (with robust SEs) to compare
means for a bunch of variables observed in both.  I.e.
.  reg sample x* [pw=wt]
in step 2b, not tabulate, with or without svy: and chi2.

On Fri, Jul 24, 2009 at 11:24 PM, Michael I.
Lichter<MLichter@buffalo.edu> wrote:
> Margo,
>
> 1. select your sample and save it in a new dataset, and then in the new
> dataset:
> a. define your stratum variable -stratavar- as you described
> b. define your pweight as you described, wt = 1/(sampling fraction) for each
> stratum
> 2. combine the full original dataset with the new one, but with stratavar =
> 1 for the new dataset and wt = 1 and with a new variable sample = 0 for the
> original and =1 for the sample, and then
> a. -svyset [pw=wt], strata(stratavar)-
> b. do your chi square test or whatever using svy commands, e.g., -svy: tab
> var1 sample-
>
> Michael
>
> Margo Schlanger wrote:
>>
>> Hi --
>>
>> I have a dataset in which the observation is a "case".  I started with
>> a complete census of the ~4000 relevant cases; each of them gets a
>> line in my dataset.  I have data filling a few variables about each of
>> them.  (When they were filed, where they were filed, the type of
>> outcome, etc.)
>>
>> I randomly sampled them using 3 strata (for one strata, the sampling
>> probability was 1, for another about .5, and for a third, about .75).
>> I end up with a sample of about 2000.  I know much more about this
>> sample.
>>
>> Ok, my question:
>>
>> 1) How do I use the svyset command to describe this dataset?  It would
>> be easy if I just dropped all the non-sampled observations, but I
>> don't want to do that, because of question 2:
>>
>> 2) How do I compare something about the sample to the entire
>> population, just to demonstrate that my sample isn't very different
>> from that entire population on any of the few variables I actually
>> have comprehensive data about. I could do this simply, if I didn't
>> have to worry about weighting:
>>
>> tabulate year sample, chi2
>>
>> But I need the weights.  In addition, I can't simply use weighting
>> commands, because in the population (when sample == 0), everything
>> should be weighted the same; the weights apply only to my sample (when
>> sample == 1).  And I can't (so far) use survey commands, because I
>> don't know the answer to (1), above.
>>
>> NOTE: Nearly all the variables I care about are categorical:  year of
>> filing, type of case.  But it's easy enough to turn them into dummies,
>> if that's useful.
>>
>>
>> Thanks for any help with this.
>>
>> Margo Schlanger
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index