[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: RE: RE: RE: unpaired regression |

Date |
Thu, 13 May 2004 15:44:44 +0100 |

There is indeed what strikes me as a very droll bug in your code. You are generating 1000 values of R^2 but writing them in (almost) random places, given your re-sorts- of the data. By chance you are overwriting some of them, and leaving the complementary fraction unchanged. This problem is (almost) a standard probability problem met in various forms. For example, some poor person is assigned to put 1000 letters in 1000 envelopes, and in despair puts them in randomly. You would expect the fraction of empty envelopes to be exp(-1) = .368 nearly, leaving you with a fraction of about .632 envelopes in which something has been put. In the letters case, this means (possibly) more than 1 letter; in our case, the -replace- can overwrite and the missing R^2 values are lost. The "(almost)" reflects the fact that -index- contains some non-missing values. But the number you get is evidently close to what the probabilistic analysis suggests. There is discussion of a programming problem in which a similar issue arises in Stata Journal 3(3), 270-277 (2003). One fix of the code follows. Nick n.j.cox@durham.ac.uk clear qui { set obs 10 gen a1 = uniform() gen a2 = uniform() gen r2 = . gen rn = . set obs 1000 gen index = _n in 1/10 gen where = _n forv i = 1/1000 { gen a2_`i' = . replace rn = uniform() in 1/10 sort rn replace a2_`i' = a2[index] sort index reg a1 a2_`i' replace r2 = e(r2) if where == `i' drop a2_`i' } } sum r2 Scott Merryman > Side Note: > > If I combine the above -forv- loop, so it looks like: > > clear > qui { > set obs 10 > gen a1 = uniform() > gen a2 = uniform() > gen r2 = . > gen rn = . > set obs 1000 > gen index = _n in 1/10 > forv i = 1/1000 { > gen a2_`i' = . > replace rn = uniform() in 1/10 > sort rn > replace a2_`i' = a2[index] > sort index > reg a1 a2_`i' > replace r2 = e(r2) in `i' > drop a2_`i' > } > } > > sum r2 > > it only produces around 630 R2 values rather than 1000. Is > there something > I am missing? * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: RE: RE: RE: RE: RE: RE: unpaired regression***From:*"Scott Merryman" <smerryman@kc.rr.com>

- Prev by Date:
**st: Declaring time series** - Next by Date:
**st: RE: Declaring time series** - Previous by thread:
**st: Declaring time series** - Next by thread:
**st: RE: RE: RE: RE: RE: RE: unpaired regression** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |