Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: RE: sampling problem


From   "Ben Jann" <[email protected]>
To   [email protected]
Subject   Re: st: RE: sampling problem
Date   Wed, 13 Jun 2007 14:08:44 +0200

Do you really need sampling for this? My suggestion would be to work
with weights. Maybe have a look at:

DiNardo, John E., Nicole Fortin, and Thomas Lemieux (1996). Labour
Market Institutions and the Distribution of Wages, 1973-1992: A
Semiparametric Approach. Econometrica 64(5): 1001-1046.

ben

On 6/13/07, join allfish <[email protected]> wrote:
Dear Nick,
Thanks for this suggestion - I did think of doing this. The problem is I
have other variables, which are far more complicated and have many more
values, which I want to use for the counterfactuals as well. I was hoping
that there may be a program which could help - or at least some short cut I
could use.
Thanks,
John



>From: "Nick Cox" <[email protected]>
>Reply-To: [email protected]
>To: <[email protected]>
>Subject: st: RE: sampling problem
>Date: Wed, 13 Jun 2007 11:50:03 +0100
>
>Focusing on this (typos corrected)
>
>I want to draw individuals from 2007 according to the distribution
>of health in 1985 so I draw individuals
>with health=1 with prob=0.4,
>health=2 with prob=0,
>health=4 with prob=0.1
>and health=5 with prob=0.5
>(where the probabilities come from the health1985 distribution).
>
>you can work out from your desired sample size the subsample
>sizes you desire. Suppose you want a sample of 1000
>
>use mydata
>bsample 400 if health == 1
>save cfsample
>
>use mydata, clear
>bsample 100 if health == 4
>append using cfsample
>
>use mydata, clear
>bsample 500 if health == 5
>append using cfsample
>
>I would be happy to learn of a smarter solution. Naturally
>you need do nothing about outcomes not to be included
>in your sample. I can't comment on the status of samples
>like this. Bootstrap experts may be able to help further.
>
>Nick
>[email protected]
>
>join allfish (a.k.a. John)
>
> > I want to sample data on the basis of counterfactuals - so
> > what would the
> > distribution of income in 2007 look like if individuals had
> > the distribution
> > of health of 1985.
> >
> > So imagine I have the following data
> >
> > id           income2007          health2007
> > health1985
> > wgt1985
> > 1                 10                      1
> >            1
> >                  65.38
> > 2                 10                      1
> >            1
> >                 153.91
> > 3                 20                      1
> >            1
> >                 458.34
> > 4                 20                      1
> >            1
> >                 484.2
> > 5                 40                      2
> >            1
> >                 906.1
> > 6                 40                      2
> >            4
> >                 943.96
> > 7                 60                      4
> >            5
> >               1176.87
> > 8                 60                      4
> >            5
> >               1389.91
> > 9                100                     5
> >           5
> >              1716.93
> > 10              100                     5
> >          5
> >             4067.68
> >
> > where weight is the sampling weights for the 1985 data (I
> > also have sampling
> > weights for the 2007 data). The order of the 1985 data makes
> > no difference
> > to the 2007 data it is just pasted in to obtain the health
> > distribution.
> > What I want to do is sample from the 2007 data to make the
> > distribution of
> > health in 2007 look like that in 1985. So I want to draw
> > individuals from
> > 2007 according to the distribution of health in 1985 so I
> > draw individuals
> > with health=1 with prob=0.4, health=2 with prob=0, health=4
> > with prob=0.1
> > and health=5 with prob=5 (where the probabilities comes from
> > the health1985
> > distribution). This should give me a hypothetical
> > distribution of income in
> > 2007 if the distribution of health was as in 1985.
> > I cannot see how to do this with the bsample command. Further
> > I am not sure
> > then how to incorporate the sampling weights to ensure that
> > my samples
> > correctly represent the population distributions.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index