[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Bootstrapping new observations to add to an existing dataset

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	RE: st: Bootstrapping new observations to add to an existing dataset
Date	Mon, 22 Jun 2009 13:54:04 +0100

I don't think that bootstrapping in Stata purports to guarantee the same
data generating process as produced the data unless that process was one
of mutual independence or -bootstrap- options mimic what happened. 

To put it more plainly: -bootstrap- does not know anything you don't
tell it. 

Nick 
[email protected] 

Martin Weiss

Davide said that he wanted to keep "the same (unknown to me) data
generating
process". Every advice so far has assumed that this means that he just
wants
"more of the same observations". If that was the case, he could also use
a
random frequency weight for every observation which would reduce the
size of
his dataset. 

Davide could clarify whether he merely wants to duplicate observations
randomly or whether he really wants "new" observations...

Austin Nichols

Davide Cantoni <[email protected]> :
You don't say how many more obs you want--let's assume you want about
100 times as many:

expand 100

will do it, or

g u=round(uniform()*200)
expand u

for a random-sized sample about 100 times as big with the same DGP.
You could also

loc n=_N*100
g u=round(uniform()*1000)
expand u
drop u
g u=uniform()
sort u
drop if _n>`n'

for a sample 100 times as big but with random numbers of replications
of each obs.

On Sun, Jun 21, 2009 at 11:58 PM, Davide Cantoni
<[email protected]> wrote:
>
> Hello, I am stuck while thinking about this issue and I would
> appreciate your suggestions. I have a dataset which I use for
> simulation purposes, to test whether my do-files run correctly. The
> issue is that this dataset is too short for many applications, as it
> has only 200 observations.
>
> What I want to do is expand this dataset to include more observations,
> but keeping the same (unknown to me) data generating process that
> created the first 200 observations. So I was thinking to proceed in a
> bootstrapping manner, by drawing the values for each one of the
> variables (var1, var2 etc etc) for the new observations from the
> empirical distributions of var1, var2,... in the first 200
> observations. Yet, I have no idea on how to implement this. I'm
> grateful for any idea. Thanks for your interest,

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: Bootstrapping new observations to add to an existing dataset
  - From: Austin Nichols <[email protected]>
- AW: st: Bootstrapping new observations to add to an existing dataset
  - From: "Martin Weiss" <[email protected]>

Prev by Date: AW: st: Bootstrapping new observations to add to an existing dataset
Next by Date: st: R: RE: rowmean within a loop
Previous by thread: AW: st: Bootstrapping new observations to add to an existing dataset
Next by thread: Re: st: Bootstrapping new observations to add to an existing dataset
Index(es):
- Date
- Thread