# SV: SV: st: Survey - raking - calibration - post stratification - calculating weights

 From "Kristian Wraae" To Subject SV: SV: st: Survey - raking - calibration - post stratification - calculating weights Date Sat, 6 Dec 2008 18:30:49 +0100

```Ok, I'm not sure what I'm missing.

The 5000 men were selected randomly from a database containing all citizens
in Denmark. The only criteria were that the selected men were between 60 and
74 years of age. All had equal probability of being selected.

I guess it is simple random sampling then.

I guess that in my terminology the 5000 men are the back ground population.

What I'm interested in is how the 600 men can be transformed into the 5000.

It is the selection bias that follows after the 5000 men has been selected
that is what concerns me.

Does it matter how the 5000 were selected?

Kristian

-----Oprindelig meddelelse-----
Fra: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af Steven Samuels
Sendt: Saturday, December 06, 2008 6:05 PM
Til: statalist@hsphsun2.harvard.edu
Emne: Re: SV: st: Survey - raking - calibration - post stratification -
calculating weights

Kristian, I am still trying to understand how you selected the
initial 5,000 men.  There are many ways to draw a "representative"
sample: simple random sampling, systematic sampling, stratification
with simple random sampling within strata.... Probabilities of
selection could be equal or unequal. So please provide more details
of this step.

Thanks,

Steve

On Dec 6, 2008, at 11:40 AM, Kristian Wraae wrote:

> Hi Steve
>
> The 5000 men were randomly drawn using the Danish CPR register
> which is a
> database containing all Danish citizens. Everybody is assigned a
> unique
> 10-digit number making it possible to select people 100% at random.
>
> The number has the form DDMMYY-vxyz with DDMMYY being date of
> birth. Z an
> equal number for women and unequal for men.
>
> The 5000 men were selected in a way that they reflected the back
> ground
> population regarding age and zip code.
>
> They were aged 60 to 74 years at the date of data acquisition.
>
> So the information we have on the 5000 men is 100% accurate and
> they are a
> 100% match for the back ground population but the only information
> was age
> and zip code.
>
> The questionnaire we mailed to the 5000 men contained a lot of
> information.
>
> For the 3750 who filled out the questionnaire we know all chronic
> diseases
> on ICD10 codes, all medication on ATC codes, partner status, level of
> education, job situation, housing, smoking habits, physical activity,
> height, weight, number of children, sexual problems.
>
> The study is a cross sectional study examining androgens and
> relations to
> body composition, health status, life style, quality of life, sexual
> dysfunction, physical performance, genetics etc.
>
> You can see a PowerPoint file with the inclusion procedure here:
> www.euphonium.dk/Inclusion.ppt
>
> I sought equal numbers in each group in order not to end up with
> too few
> people amongst the eldest. I wanted a good representation in all
> age strata
> since we wanted to be able to make reference intervals for the
> different
> androgens for healthy 60-74 year olds and because we mainly
> investigate
> associations and we are not primarily interested in the
> distribution in the
> back ground population. But I'd like to be able to make some
> estimates. As
> an example I'd like to know the prevalence of erectile dysfunction
> in the
> back ground population or estimate the prevalence of hypogonadism or
> diabetes using the data from the 600 very thoroughly examined men.
>
> If you can help me I'll be very grateful
>
> Best regards
> Kristian Wraae
>
> -----Oprindelig meddelelse-----
> Fra: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] På vegne af Steven
> Samuels
> Sendt: Saturday, December 06, 2008 5:04 PM
> Til: statalist@hsphsun2.harvard.edu
> Emne: Re: st: Survey - raking - calibration - post stratification -
> calculating weights
>
>
> --
>
> Kristian, before I can answer your questions, I need details of how
> you selected the original 5,000 men.  Describe the target
> population;  the sampling "frame" (or list) from which you drew the
> sample of 5,000; what information you have about the population (age
> distribution, for example); what information about each man on the
> list was available; exactly how you selected the 5,000.   Also, what
> is the purpose of the study?  Why did you seek equal numbers of men
> in the 15 age groups at later stages?
>
>
> -Steve
>> questionaire.
>>
> On Dec 6, 2008, at 5:32 AM, Kristian Wraae wrote:
>
>> Hello all
>>
>> I have a question regarding how to weight a data set.
>>
>> The data is from a population based cross sectional study.
>>
>> 5000 randomly selected men reflecting the backgound population were
>> mailed a
>> questionaire.
>>
>> 75% responsrate. 3750 questionnaires filled out. We know the age
>> and the zip
>> code for non-responders.
>>
>> The questionnaire contained several sociodemographic parameters s1,
>> s2,
>> ..... Sn
>>
>> Then 1845 men from the group that had filled out a questionnaire were
>> invited to take part in a scientific project. The men were randomly
>> selected
>> with an equal number in each age group (15 age groups of one year
>> intervals). So 123 men in each group.
>>
>> 946 men accepted a telephone call. 768 men never responded and 131
>> refused
>> to be interwied on telephone.
>>
>> 864 men of the 946 were then randomly contacted with equal numbers
>> in each
>> age group and 697 men agreed to take part in the project. 97 men
>> later
>> cancelled or never showed up.
>>
>> So 600 men were included for further studies.
>>
>> Now I would like to weight these 600 men so they reflect the
>> background
>> population in order to estimate the distribution of different
>> measures in
>> the background population (X1, X2, .... Xi) based on measures
>> amongst the
>> 600 men (Y1, Y2, ..... Yi).
>>
>> How do I do this?
>>
>> As far as I can tell I need to compensate for the differences
>> between the
>> 5000 and the 3750 and between the 3750 and the 600. Since the 1845
>> were
>> randomly selected and an equal number in each age group were
>> contacted I
>> assume that all men had an equal probability of being included so
>> design
>> weights are not really needed. Right?
>>
>> But how do I compute a pweight that takes the two steps into account?
>>
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```