Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Sample Wegihts


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Sample Wegihts
Date   Tue, 9 Mar 2010 15:13:20 -0600

On Tue, Mar 9, 2010 at 12:35 PM, Jason Dean, Mr
<jason.dean@mail.mcgill.ca>wrote:

> I have a quick question. I currently have a 5% random sample of Canada. I
> also have 4 extra random samples of only the four largest urban cities (I
> have dropped duplicate observations between samples).
>
> What is the best strategy to include these extra samples and keep the
> sample representative of the country. I intend to conditon on these cities
> with dummy variable in my regression.  However, I would prefer to use sample
> weights but I am not sure the best way to go about creating them. Any
> suggestions would be greatly appreciated.


1. Keep strata identifiers from the original data -- say stratum variable

2. Identify samples in say sample variable, so that 1 is your microcensus,
and 2 through 5 are extra samples.

3. Your new combined strata should be

egen new_strata = group( sample strata )

4. Your new PSUs should be the original PSUs. They should work as is, but
just to be safe,

egen new_PSU = group( sample strata old_PSU )

5. Now, the weight variables are tricky. If you don't have any weight
adjustments (and I doubt that), the weights are inverse probabilities of
selection. If the 5% sample and extra samples are independent of one another
(meaning, the information that was used to design the extra samples does not
rely on any pieces on which the 5% sample relies... I doubt that though),
then

overall P[ selection ]  = P[ to be selected in the first sample ] + P[ to be
selected in the second sample ] - P [ to be selected in both ] = 1 -
(1-P[first])*(1-P[second])

So your weights should become lower in the joined sample (in those cities
for which extra samples were collected), as Michael indicated.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index