Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: RE: RE: unpaired regression


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: RE: RE: unpaired regression
Date   Thu, 13 May 2004 15:44:44 +0100

There is indeed what strikes me as a very droll bug
in your code. 

You are generating 1000 values of R^2 but 
writing them in (almost) random places, given 
your re-sorts- of the data. By chance you are 
overwriting some of them, and leaving
the complementary fraction unchanged. 

This problem is (almost) a standard 
probability problem met in various forms. 
For example, some poor person is assigned to 
put 1000 letters in 1000 envelopes, and in 
despair puts them in randomly. You would 
expect the fraction of empty envelopes 
to be exp(-1) = .368 nearly, leaving 
you with a fraction of about .632 envelopes in 
which something has been put. In the letters
case, this means (possibly) more than 1 letter; 
in our case, the -replace- can overwrite and
the missing R^2 values are lost. 

The "(almost)" reflects the fact that 
-index- contains some non-missing values. 
But the number you get is evidently 
close to what the probabilistic analysis 
suggests. 

There is discussion of a programming 
problem in which a similar issue arises
in Stata Journal 3(3), 270-277 (2003). 

One fix of the code follows. 

Nick 
n.j.cox@durham.ac.uk 

clear
qui {
 	set obs 10
  	gen a1 = uniform()
  	gen a2 = uniform()
    	gen r2 = .
  	gen rn = .
       set obs 1000
  	gen index = _n  in 1/10
	gen where = _n 
   	forv i = 1/1000 {
   		gen a2_`i' = . 
   		replace rn = uniform() in 1/10
   		sort rn
             replace a2_`i' = a2[index] 
   		sort index 
             reg a1 a2_`i' 
   		replace r2 = e(r2) if where == `i' 
   		drop  a2_`i'
   	}
  }
 
sum r2

Scott Merryman

 
> Side Note:
> 
> If I combine the above -forv- loop, so it looks like:
> 
> clear
>  qui {
> 	set obs 10
>  	gen a1 = uniform()
>  	gen a2 = uniform()
>    	gen r2 = .
>  	gen rn = .
>       set obs 1000
>  	gen index = _n  in 1/10
>   	forv i = 1/1000 {
>   		gen a2_`i' = . 
>   		replace rn = uniform() in 1/10
>   		sort rn
>             replace a2_`i' = a2[index] 
>   		sort index 
>             reg a1 a2_`i' 
>   		replace r2 = e(r2) in `i'
>   		drop  a2_`i'
>   	}
>  }
> 
>  sum r2
>   
> it only produces around 630 R2 values rather than 1000.  Is 
> there something
> I am missing?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index