Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: sampling problem


From   "Michael Blasnik" <[email protected]>
To   <[email protected]>
Subject   st: Re: sampling problem
Date   Wed, 13 Jun 2007 08:24:02 -0400

...
It isn't clear to me if you define the 1985 health distribution based on the raw health1985 variable or if you are employing the wgt1985 to come up with a weighted distribution. Here's an approach that may work for the simpler case, although it can be modified to include the wgt1985 weights to define the target population fractions:

gen wt_h85=.
count if health1985<.
local popcount85=r(N)
qui levels health2007, local(hcats)
foreach h of local hcats {
qui count if health1985==`h'
replace wt_h85=r(N)/(`popcount85') if health2007==`h'
}


The wt_85 variable will now hold, for each value of health2007, the proportion of matching values in health1985. These weights will not sum to one since they are normalized to the observed health1985 proportions. You could normalize them for the 2007 data to use them directly as sampling fractions or you could use them to generate weighted point estimates based on the full dataset -- analogous to survey raking or post-stratification.

Michael Blasnik

----- Original Message ----- From: "join allfish" <[email protected]>
To: <[email protected]>
Sent: Wednesday, June 13, 2007 6:17 AM
Subject: st: sampling problem



I want to sample data on the basis of counterfactuals - so what would the distribution of income in 2007 look like if individuals had the distribution of health of 1985.

So imagine I have the following data

id income2007 health2007 health1985 wgt1985
1 10 1 1 65.38
2 10 1 1 153.91
3 20 1 1 458.34
4 20 1 1 484.2
5 40 2 1 906.1
6 40 2 4 943.96
7 60 4 5 1176.87
8 60 4 5 1389.91
9 100 5 5 1716.93
10 100 5 5 4067.68

where weight is the sampling weights for the 1985 data (I also have sampling weights for the 2007 data). The order of the 1985 data makes no difference to the 2007 data it is just pasted in to obtain the health distribution.
What I want to do is sample from the 2007 data to make the distribution of health in 2007 look like that in 1985. So I want to draw individuals from 2007 according to the distribution of health in 1985 so I draw individuals with health=1 with prob=0.4, health=2 with prob=0, health=4 with prob=0.1 and health=5 with prob=5 (where the probabilities comes from the health1985 distribution). This should give me a hypothetical distribution of income in 2007 if the distribution of health was as in 1985.
I cannot see how to do this with the bsample command. Further I am not sure then how to incorporate the sampling weights to ensure that my samples correctly represent the population distributions.
Any help would be much appreciated.
Yours,
John
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index