Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: sampling query

 From Steven Samuels <[email protected]> To [email protected] Subject Re: st: sampling query Date Thu, 27 Jan 2011 16:54:54 -0500

```
Rich,

```
You must compute these probabilities for every member of the combined sample, not just those selected in 2+ cohorts. If possible, your reading should include Section 11.2 "Duplicate Listings; Overlapping Frames" of Leslie Kish, Survey Sampling, Wiley, 1965.
```
Steve

Rich,

Look up "multiple frames". That's a more common term for samples in
which the ultimate unit can be reached through different trajectories
(say landline phone sample, cell phone sample, and area/personal visit
sample). The probabilities should be combined as

1 - Prob[ in the sample ] = product over k of (1-Prob[ reach the unit
through the k-th frame ] )

which for small probabilities leads to sum of selection probabilities.
You are totally right that the probability should go up rather than
down.

On Thu, Jan 27, 2011 at 10:37 AM, Richard Goldstein
<[email protected]> wrote:
```
```all,

I have received a report in which the report writer was stuck with the
```
following design (already implemented before his involvement): a number
```of "cohorts" were set up (22 of them in fact) and the definitions of
```
these cohorts were not mutually exclusive (i.e., there was some overlap
```in membership so that a given observation could appear in more than 1
cohort); to calculate the probability weights, the report writer first
calculated the probability of inclusion for each cohort (simply as n/N
```
where n is sample size from cohort and N is population size of cohort).
```
```
For observations in more than one cohort, who were actually selected, he
```then multiplied the inclusion probabilities of each cohort that
observations was in. Since each inclusion probability is less than 1,
the combined inclusion probability is smaller than the individual
```
inclusion probabilities for the individual cohort. And then, of course,
```the weights are greater for these people (since the weight is just the
inverse of the inclusion probability).

```
However, since these observations are in more than one cohort, shouldn't
```the combined probability be greater for them (rather than smaller)?

How should the combined inclusion probability be calculated?

Or am I just wrong and the writer of the report is correct?

Any references on dealing with overlapping "cohorts" would also be
greatly appreciated.

Rich

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```

--
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```