Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Sample Wegihts

 From Stas Kolenikov To statalist@hsphsun2.harvard.edu Subject Re: st: Sample Wegihts Date Wed, 10 Mar 2010 08:17:32 -0600

```On Tue, Mar 9, 2010 at 11:47 PM, Michael Lichter <mlichter@fastmail.fm>wrote:

> Stas said:
>
>> overall P[ selection ]  = P[ to be selected in the first sample ] + P[ to
>> be
>> selected in the second sample ] - P [ to be selected in both ] = 1 -
>> (1-P[first])*(1-P[second])
>>
>>
> Is this correct even with if the first sample is an SRS and the second is
> clustered? I can't show otherwise, but it doesn't feel right that every case
> in a specific urban area should have the same weight regardless of which
> sample it was drawn from.

Well, these are the fundamentals of the survey weight computations.
Probability is a probability; it may be as simple as n/N for SRS, and it may
be as complex as anything with stratification at one stage, PPS WOR in
another stage, with some screening on top of (the nearest birthday
randomization is difficult to characterize, but I have a feeling that we are
talking real sampling here); but if independence is the right assumption,
then this is your overall P[selection]. In the end, the weights must
generalize either sample to the same population (of that particular city).
And if Jason did not tell you from which sample a given duplicate came from,
you'd have no way of determining whether you want "this" weight or "that"
weight.

By the way, the duplicates provide a way to verify this formula -- Jason can
check how many he got, and whether that number matches P[ selected twice]
times population of the city.

Frankly, I doubt that the samples are perfectly independent of one another.
But correcting for that means going deep into fidgeting with each unit /
cluster. I doubt whether the relevant information and methodology is readily