# Re: st: Randomly picking observations based on a certain condition

 From Andrew Dyck To statalist@hsphsun2.harvard.edu Subject Re: st: Randomly picking observations based on a certain condition Date Wed, 13 Apr 2011 15:16:36 -0700

```After you consider the comments from Nick and J, you wish to proceed
with your analysis as you initially stated it, I think the following
should work. Here I create some sample data with 50 observations and 5
groups (quintiles). See if this might work for your data the way I
to keep the dataset small.

* sample data
set obs 50
egen group = seq(), from(1) to(5)
gen adults = round( runiform()*5, 2 )

* random variable for sorting
gen r = runiform()

* create a cumulative sum of adults
* sorting randomly within the group.
drop r

* keep all obs below the cutoff

Good luck,
Andrew

On Wed, Apr 13, 2011 at 2:02 PM, Nikhil Srivastava
<nikhil.del85@gmail.com> wrote:
>
> I am not trying to actually sample households. As I wrote in my rely
> to Nick,I am trying look at the effectiveness of a transfer program
> targeted to adults of a household which has a certain exclusion error.
> The exclusion error that we are assuming is that 1 percent of eligible
> participants within each expenditure quintile do not receive the
> benefits. In my sample within the first quintile 1 percent of the
> total adults comes to around 100. Thus for the first quintile I need
> to randomly assign non-beneficiary status to households so that the
> total number of adults for these households comes to 100. Similarly I
> have to pick randomly 1 percent of adults for each quintile and assign
> them non-beneficiary status. In my previous mail I used the number 100
> as an example. Thanks
>
> Nikhil
>
> On Wed, Apr 13, 2011 at 1:06 PM, Joerg Luedicke
> <joerg.luedicke@gmail.com> wrote:
> > On Wed, Apr 13, 2011 at 3:17 PM, Nikhil Srivastava
> > <nikhil.del85@gmail.com> wrote:
> >> Hi,
> >>
> >> I have a dataset at the household level which contains the expenditure
> >> details of a sample of households. The dataset also records the number
> >> of adults within each household. I have divided this dataset into 5
> >> quintiles based on the level of expenditure. Now I need to randomly
> >> select a set of observations within each quintile so that the sum of
> >> the adults for those observations comes to 100. Could somebody please
> >> help me in writing a code for this part?
> >>
> >> I would really appreciate any help in this regard. Thanks
> >
> > Do I understand that right, you want to sample households, and within
> > each quintile of household expenditure, the number of household
> > members among sampled households is supposed to add up to 100? Why
> > would you do that? Why not just taking a random sample of households
> > or a stratified sample with respect to household size, if that is a
> > concern. That way, you would at least have a clear picture of the
> > population you are targeting, whereas in the other case, this picture
> > becomes pretty blurry, no?
> >
> > J.
```