[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Carlo Lazzaro" <carlo.lazzaro@tiscalinet.it> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
R: st: Bootstrapping new observations to add to an existing dataset |

Date |
Mon, 22 Jun 2009 17:34:46 +0200 |

Davide wrote: "I just don't want to create new observations by filling up the dataset with -invnorm(uniform())- because I want to preserve the properties of the variables: some are binary.." Dear Davide, if only variables properties withold you from creating new observations, you can use -invibeta- for creating random observation with given parameters a and b (in Stata 9.2/SE it goes like this: g A=invibeta(a,b,uniform()) for binary variables. The same approach can be replicated for continuous or counts variables, with -invgammap-(in Stata 9.2/SE it goes like this: g B=b*invgammap(a,uniform()). Two interesting textbooks on fitting beta and gamma distributions (as well as taking advantage from their inverse distributions) are: Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian approach to clinical trials and health-care evaluation. Chichester: Wiley, 2004 Gelman A, Carlin JB. Bayesian Data Analysis. Second edition. Boca Raton: Chapman & Hall/CRC, 2004 Kind Regards, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Davide Cantoni Inviato: lunedì 22 giugno 2009 14.59 A: statalist@hsphsun2.harvard.edu Oggetto: Re: st: Bootstrapping new observations to add to an existing dataset Thank you, gentlemen. To clarify: yes, In a sense I want more of the same observations. I just don't want to create new observations by filling up the dataset with -invnorm(uniform())- because I want to preserve the properties of the variables: some are binary, some are integers, and some are just any kind of number. The values of these variables are irrelevant for my regressions, their underlying distribution is not: I want binary variables to remain binary etc. Since there are more than 300 variables in the dataset, I do not want to do this one-by-one, finding out whether var4 consists only of integers between 1 and 10, var231 is binary etc. So one way to go ist just take more of the same observations. E.g., obs 201 is the same as obs 133, obs 202 is the same as obs 78 and so on. Or, and that's the other thing I was thinking of, I could create obs 201 by drawing a new value from the existing distribution (given by obs1-obs200) of var1, then drawing a new value from the distribution of var2 and so on... Davide 2009/6/22 Martin Weiss <martin.weiss1@gmx.de>: > > <> > > Davide said that he wanted to keep "the same (unknown to me) data generating > process". Every advice so far has assumed that this means that he just wants > "more of the same observations". If that was the case, he could also use a > random frequency weight for every observation which would reduce the size of > his dataset. > > Davide could clarify whether he merely wants to duplicate observations > randomly or whether he really wants "new" observations... > > > > HTH > Martin > > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Austin Nichols > Gesendet: Montag, 22. Juni 2009 14:33 > An: statalist@hsphsun2.harvard.edu > Betreff: Re: st: Bootstrapping new observations to add to an existing > dataset > > Davide Cantoni <davide.cantoni@gmail.com> : > You don't say how many more obs you want--let's assume you want about > 100 times as many: > > expand 100 > > will do it, or > > g u=round(uniform()*200) > expand u > > for a random-sized sample about 100 times as big with the same DGP. > You could also > > loc n=_N*100 > g u=round(uniform()*1000) > expand u > drop u > g u=uniform() > sort u > drop if _n>`n' > > for a sample 100 times as big but with random numbers of replications > of each obs. > > On Sun, Jun 21, 2009 at 11:58 PM, Davide Cantoni > <davide.cantoni@gmail.com> wrote: >> >> Hello, I am stuck while thinking about this issue and I would >> appreciate your suggestions. I have a dataset which I use for >> simulation purposes, to test whether my do-files run correctly. The >> issue is that this dataset is too short for many applications, as it >> has only 200 observations. >> >> What I want to do is expand this dataset to include more observations, >> but keeping the same (unknown to me) data generating process that >> created the first 200 observations. So I was thinking to proceed in a >> bootstrapping manner, by drawing the values for each one of the >> variables (var1, var2 etc etc) for the new observations from the >> empirical distributions of var1, var2,... in the first 200 >> observations. Yet, I have no idea on how to implement this. I'm >> grateful for any idea. Thanks for your interest, >> >> Davide > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Bootstrapping new observations to add to an existing dataset***From:*Davide Cantoni <davide.cantoni@gmail.com>

- Prev by Date:
**RE: st: AW: RE: AW: Big set of 1x1 matrices into scalar - how to do fast?** - Next by Date:
**st: Problem with shp2dta.** - Previous by thread:
**AW: st: Bootstrapping new observations to add to an existing dataset** - Next by thread:
**st: AW: question about indeplist** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |