# st: RE: RE: RE: RE: RE: unpaired regression

 From "Nick Cox" To Subject st: RE: RE: RE: RE: RE: unpaired regression Date Thu, 13 May 2004 15:44:44 +0100

```There is indeed what strikes me as a very droll bug

You are generating 1000 values of R^2 but
writing them in (almost) random places, given
your re-sorts- of the data. By chance you are
overwriting some of them, and leaving
the complementary fraction unchanged.

This problem is (almost) a standard
probability problem met in various forms.
For example, some poor person is assigned to
put 1000 letters in 1000 envelopes, and in
despair puts them in randomly. You would
expect the fraction of empty envelopes
to be exp(-1) = .368 nearly, leaving
you with a fraction of about .632 envelopes in
which something has been put. In the letters
case, this means (possibly) more than 1 letter;
in our case, the -replace- can overwrite and
the missing R^2 values are lost.

The "(almost)" reflects the fact that
-index- contains some non-missing values.
But the number you get is evidently
close to what the probabilistic analysis
suggests.

There is discussion of a programming
problem in which a similar issue arises
in Stata Journal 3(3), 270-277 (2003).

One fix of the code follows.

Nick
n.j.cox@durham.ac.uk

clear
qui {
set obs 10
gen a1 = uniform()
gen a2 = uniform()
gen r2 = .
gen rn = .
set obs 1000
gen index = _n  in 1/10
gen where = _n
forv i = 1/1000 {
gen a2_`i' = .
replace rn = uniform() in 1/10
sort rn
replace a2_`i' = a2[index]
sort index
reg a1 a2_`i'
replace r2 = e(r2) if where == `i'
drop  a2_`i'
}
}

sum r2

Scott Merryman

> Side Note:
>
> If I combine the above -forv- loop, so it looks like:
>
> clear
>  qui {
> 	set obs 10
>  	gen a1 = uniform()
>  	gen a2 = uniform()
>    	gen r2 = .
>  	gen rn = .
>       set obs 1000
>  	gen index = _n  in 1/10
>   	forv i = 1/1000 {
>   		gen a2_`i' = .
>   		replace rn = uniform() in 1/10
>   		sort rn
>             replace a2_`i' = a2[index]
>   		sort index
>             reg a1 a2_`i'
>   		replace r2 = e(r2) in `i'
>   		drop  a2_`i'
>   	}
>  }
>
>  sum r2
>
> it only produces around 630 R2 values rather than 1000.  Is
> there something
> I am missing?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```