[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Carlo Lazzaro" <carlo.lazzaro@tin.it> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Fw: R: st: odd results after insample |

Date |
Wed, 30 Sep 2009 18:39:32 +0200 |

Dear Statalisters, thanks to Brian Poi, some days ago I solved a problem in drawing random samples from a given dataset with Stata 9.2/SE. I would like to share Brian's kind reply with whom might be interested in the same topic. I also take the chance to thank Martin Weiss one more time for his precious support along the way. Kind Regards, Carlo -----Messaggio originale----- Da: Brian P. Poi [mailto:bpoi@stata.com] Inviato: lunedì 28 settembre 2009 18.31 A: Carlo Lazzaro Oggetto: Re: R: st: odd results after insample > I take the chance to ask you whether Stata 9.2 SE (I don't know about other > more recent releases) can be programmed to run -sample- repeatedly (and not > just one time) for drawing, say, 10,000 random samples from a given dataset, Yes, you could do . sysuse auto . sample . sample . sample or put -sample- in a -forvalues- loop. But you'd have a hard time convincing me that's the right thing to do. Or, do you mean something like this: set seed 1 sysuse auto gen mean = . quietly forvalues i = 1/74 { preserve sample 50 summ mpg scalar mpgm = r(mean) restore replace mean = mpgm in `i' } su mean di %20.16f r(mean) That is perfectly valid, as long as you keep in mind that -sample- samples without replacement. On the other hand, sysuse auto,clear set seed 1 bootstrap mu = r(mean), size(50) reps(74) saving(mybs, replace): summ mpg use mybs, clear summ mu di %20.16f r(mean) will give you a slightly different answer because -bootstrap- samples with replacement. Thus, the $64,000 question is whether you want to sample with or without replacement. ************************************************************************* ___ ____ ____ ____ ____ /__ / ____/ / ____/ Brian P. Poi, Ph.D. ___/ / /___/ / /___/ Senior Economist StataCorp LP 4905 Lakeway Drive College Station, TX 77845 bpoi@stata.com ************************************************************************* On Mon, 28 Sep 2009, Carlo Lazzaro wrote: > Dear Brian, > thanks a lot for your kind reply. I was actually banging my head against the > wall in trying to understand what went wrong with my code lines and you shed > light on this. > I take the chance to ask you whether Stata 9.2 SE (I don't know about other > more recent releases) can be programmed to run -sample- repeatedly (and not > just one time) for drawing, say, 10,000 random samples from a given dataset, > no matter the underlying distribution: in fact, this is the need I am > currently facing. > I am very fond of -simulate- as far as my programming skills allow me to > invoke it, but it requires Stata users to know (or to mimic) the underlying > distribution of the population. > > Thanks a lot again for your kindness and for your time. > > Kind Regards, > Carlo > -----Messaggio originale----- > Da: Brian P. Poi [mailto:bpoi@stata.com] > Inviato: lunedì 28 settembre 2009 16.07 > A: Carlo Lazzaro > Oggetto: Re: st: odd results after insample > > Carlo, > > I don't think anyone on statalist actually answered the question of why > your code doesn't produce 2000 observations like you expect. It had me > stumped for a bit, so I just had to try the code myself to figure it out. > > Here's why. In the first part of your loop you randomly sort the data and > summarize the first 20 observations. In the second part of your loop you > try and store the mean and standard deviation in the `i'th observation, > assuming that `i' runs from 1 to 2000 so that you will fill in the 1st > observation, then the 2nd, and so on up to the 2000th. But that won't > work, because in every iteration of the loop you change the order of your > data. Therefore, you essentially are sticking the mean and s.d. into a > random observation of your dataset. Given the luck of the draw, some > observations of ln_g_20 are being filled in more than once, and others > never do get filled in like you expect. > > Also, note that because you generate A for only 972 observations, your > mean and s.d. will on average will be computed using (972/2000)*20 = 9.72 > observations, not 20 observations. > > You could make your loop work with -preserve- and -restore, preserve- > statements or perhaps with some contorted logic, but it's easier to just > let -simulate- do it. > > ************************************************************************* > ___ ____ ____ ____ ____ > /__ / ____/ / ____/ Brian P. Poi, Ph.D. > ___/ / /___/ / /___/ Senior Economist > StataCorp LP > 4905 Lakeway Drive > College Station, TX 77845 > bpoi@stata.com > ************************************************************************* > > On Sat, 26 Sep 2009, Carlo Lazzaro wrote: > >> Dear Statalisters, >> as an alternative to - simulate - , I have written the following do file >> (for Stata 9.2/SE) to draw 2000 random samples, 20 observations each, from > a >> normal distribution: >> >> drop _all >> set more off >> set obs 2000 >> obs was 0, now 2000 >> g double ln_g_20=. >> g double ln_sd_g_20=. >> set seed 999 >> qui gen A=5.37 + 1.19*invnorm(uniform()) in 1/972 >> qui forvalues i = 1(1)2000 { >> qui gen ln_20`i'=A >> qui generate random`i' = uniform() >> qui sort random`i' >> qui generate insample`i' = _n <= 20 >> qui sum ln_20`i' if insample`i' == 1 >> replace ln_g_20=r(mean) in `i' >> replace ln_sd_g_20=r(sd) in `i' >> drop ln_20`i' >> drop random`i' >> drop insample`i' >> } >> drop A >> >> However, as a result I have obtained 1721 observations instead of the >> expected 2000. >> >> sum ln_g_20 ln_sd_g_20 >> >> Variable | Obs Mean Std. Dev. Min Max >> -------------+-------------------------------------------------------- >> ln_g_20 | 1271 5.314033 .3800687 3.79247 6.587941 >> ln_sd_g_20 | 1271 1.101084 .2835007 .0260279 2.161299 >> >> >> Besides, results are even more puzzling when I increase the number of >> samples (again 20 observations each), in that I get a different number of >> observation for ln_g and ln_sd_g. >> >> Comments are gratefully acknowledged. >> >> Thanks a lot for your kindness and for your time. >> >> Kind Regards, >> Carlo >> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ >> > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**AW: st: Using Rolling Regression with Panel Data** - Next by Date:
**st: one-sided p-value using test x1=x2** - Previous by thread:
**st: unix problem** - Next by thread:
**st: one-sided p-value using test x1=x2** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |