Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: statalist-digest V4 #2506


From   evan roberts <[email protected]>
To   [email protected]
Subject   st: Re: statalist-digest V4 #2506
Date   Tue, 31 Oct 2006 08:37:35 -0600

David McClintick asked how he could make ten samples of 5 observations from 91 different datasets, and end up with 91 different datasets combining the samples, with ten variables and five observations.

There is perhaps more efficient code than the following but this seems to do the job. The key thing to do in each sample is to rename the single variable to variable`i' (i=1..10), then generate an id variable (=_n), merge by id into one dataset and then erase the non-merged samples. Your workspace at the end of this procedure has 182 files, the original 1910-2000 files with one variable, and 91 other files sample`x' with 10 variables sequentially named variables representing a sample.

----
set more off

forval x =1910/2000 {
forval i =1/10 {
use `x', clear
sample 5, count
* rename x x`i' * this is the variable you are sampling

gen id=_n
sort id
save sample`x'`i', replace
}

forval i =2/10 {
use sample`x'1
sort id
merge id using sample`x'`i'
erase sample`x'`i'.dta
drop _merge
save sample`x'1, replace
}

use sample`x'1, clear
drop id
save sample`x', replace
erase sample`x'1.dta

}

set more on
-----
Evan Roberts
Minnesota Population Center


- -----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of David
McClintick
Sent: Tuesday, October 31, 2006 1:31 AM
To: [email protected]
Subject: RE: st: Sample command question

After looking at that code, I realize that I have not been clear enough.
What I meant by "without replacement" was that I wanted each of the 10
samples of each dataset to be taken from the original dataset, meaning
that while no observation could be repeated within the sample, it could
conceivably be repeated across samples. With your suggestion, it returns
varying numbers of observations, which I believe is caused by the
aforementioned issue. Additionally, it seems to return all of the sample
observations in the first variable?
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index