One way to do this is to use the following method:
suppose you only want to sample 100 ids from them(you can easily change the
method to fit your sampling criteria)
sort id
by id: gen n=_n
sort n id
gen tmp=uniform() if n==1
sort n tmp
gen tmpx=.
by n : replace tmpx=_n
gen sampledid=1 if n==1& tmpx<=100
gen xid=.
sort id sampleid
by id: replace xid=sampleid[1]
keep xid==1
this should works
jt
----- Original Message -----
From: "Chih-Mao Hsieh" <[email protected]>
To: <[email protected]>
Sent: Tuesday, November 25, 2003 8:37 PM
Subject: st: -Sample-, but each observation has irregular # of rows
> Hello all,
>
>
> I have a dataset from which I have to sample. For each observation
> ("ID" in table below), there are multiple rows. And the number of rows
> is irregular per ID. My problem: I want to be able to sample ID's from
> the dataset. Say the dataset has 3 ID's like the following:
>
> ID x y var1 var2
> 001 3 4 5 3
> 001 3 5 7 6
> 001 4 5 2 3
> 002 2 4 1 5
> 003 1 2 9 11
> 003 1 3 6 2
> 003 1 4 9 5
> 003 2 3 10 2
> 003 2 4 7 4
> 003 3 4 6 12
>
> If I sample 1 ID and get #001, I want to get all of #001's 3 rows. Of
> course, I am aware of the working man's solution: collapse by(id),
> sample, merge back to original, keep _merge==3. But this seems
> inefficient. I've searched the Statalist and "search sample" in STATA
> 8, but to no avail. Is there not an easier way to do what I need?
>
>
> Thanks in advance,
> Chihmao
>
