# Re: st: Sample Wegihts

 From Stas Kolenikov To statalist@hsphsun2.harvard.edu Subject Re: st: Sample Wegihts Date Tue, 9 Mar 2010 15:13:20 -0600

```On Tue, Mar 9, 2010 at 12:35 PM, Jason Dean, Mr
<jason.dean@mail.mcgill.ca>wrote:

> I have a quick question. I currently have a 5% random sample of Canada. I
> also have 4 extra random samples of only the four largest urban cities (I
> have dropped duplicate observations between samples).
>
> What is the best strategy to include these extra samples and keep the
> sample representative of the country. I intend to conditon on these cities
> with dummy variable in my regression.  However, I would prefer to use sample
> weights but I am not sure the best way to go about creating them. Any
> suggestions would be greatly appreciated.

1. Keep strata identifiers from the original data -- say stratum variable

2. Identify samples in say sample variable, so that 1 is your microcensus,
and 2 through 5 are extra samples.

3. Your new combined strata should be

egen new_strata = group( sample strata )

4. Your new PSUs should be the original PSUs. They should work as is, but
just to be safe,

egen new_PSU = group( sample strata old_PSU )

5. Now, the weight variables are tricky. If you don't have any weight
adjustments (and I doubt that), the weights are inverse probabilities of
selection. If the 5% sample and extra samples are independent of one another
(meaning, the information that was used to design the extra samples does not
rely on any pieces on which the 5% sample relies... I doubt that though),
then

overall P[ selection ]  = P[ to be selected in the first sample ] + P[ to be
selected in the second sample ] - P [ to be selected in both ] = 1 -
(1-P[first])*(1-P[second])

So your weights should become lower in the joined sample (in those cities
for which extra samples were collected), as Michael indicated.

