# Re: st: AW: Simulating stepwise regression

 From John Antonakis To statalist@hsphsun2.harvard.edu Subject Re: st: AW: Simulating stepwise regression Date Sat, 08 Aug 2009 11:28:22 +0200

Thanks; this is really excellent and very kind of you to provide me with such a detailed program. It works great.
```
Thank you and to Martin too....I have a nice combo of programs to play with.

John.

____________________________________________________

Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland

Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305

Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en

Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________

On 07.08.2009 19:31, Tirthankar Chakravarty wrote:
```
```Here is a way; maybe someone can suggest something which is not quite
this cumbersome. Note that since sort order is not important, you can
merge by an identifier based on row number.

An "id" variable is created in each simulation file. Once the
simulations are done, the files are merged. Then a -reshape- and a
-collapse- gets you where you want. I am assuming you want the mean
R^2 for each set of simulations. This can be changed.
****************************************
capture program drop sim
version 10
program define sim, rclass
drop _all
syntax , nreg(integer ) nobs(integer )
set obs `nobs'
forv i=1/`nreg' {
g x`i' = invnormal(uniform())
}
gen y = invnorm(uniform())
stepwise, pr(.2): regress y x*
return scalar r2d2 = e(r2)
end

foreach nobs of numlist 1000 1500 2000 {
forv nreg = 1(1)10 {
simulate r2d2=r(r2d2), reps(10000) ///
saving(sw_r2_`nobs'_`nreg'.dta, every(1) ///
replace) seed(123): sim, nreg(`nreg') ///
nobs(`nobs')
use sw_r2_`nobs'_`nreg'.dta
g id=_n
rename r2d2 r2_`nobs'_`nreg'
sort id
save sw_r2_`nobs'_`nreg'.dta, replace
}
}

/* presenting the results */
// merge the files; the last simulation is the
// file in memory.
foreach nobs of numlist 1000 1500 2000 {
forv nreg = 1(1)10 {
if "`c(filename)'" != "sw_r2_`nobs'_`nreg'.dta" {
// this checks if the last filename has been reached
merge id using sw_r2_`nobs'_`nreg', ///
_merge(identifier_`nobs'_`nreg') unique
sort id
}
else {
di in g "All done merging."
}
}
}
drop identifier*
// reshape the data to be cross-classified by no. of
// regressors and no. of observations
reshape long r2_1000_ r2_2000_ r2_1500_, i(id) j(numreg)
// get the means of the R^2 for each simulation
collapse (mean) r2*, by(numreg)
list, noobs
save simulations_merged_collapsed, replace
****************************************

T

On Fri, Aug 7, 2009 at 5:44 PM, John Antonakis<john.antonakis@unil.ch> wrote:
```
```Thanks Tirthankar!

I see that separate files are stored for each simulation. How could one
combine those results in one file?

Also, how would one generate a table (sample size on the horizontal and
number of predictors on the vertical) with the simulated r-squares?
Best,
J.

____________________________________________________

Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland

Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305

Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en

Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________

On 07.08.2009 13:10, Tirthankar Chakravarty wrote:
```
```You should probably use -simulate-. Here is what it might look like:

***********************************
capture program drop sim
version 10
program define sim, rclass
drop _all
syntax , nreg(integer ) nobs(integer )
set obs `nobs'
forv i=1/`nreg' {
g x`i' = invnormal(uniform())
}
gen y = invnorm(uniform())
stepwise, pr(.2): regress y x*
qui indeplist
return scalar r2d2 = e(r2)
end

/*
simulate for each of the regressor and
sample size combinations required.
10,000 replications.
*/
foreach nobs of numlist 1000 1500 2000 {
forv nreg = 1(1)10 {
simulate r2d2=r(r2d2), reps(10000) ///
saving(sw_r2_`nobs'_`nreg'.dta, every(1) ///
replace) seed(123): sim, nreg(`nreg') ///
nobs(`nobs')
}
}
use sw_r2_1000_5, clear
kdensity r2d2
***********************************************

On Fri, Aug 7, 2009 at 11:18 AM, John Antonakis<john.antonakis@unil.ch>
wrote:

```
```That's very helpful; thanks Martin.

To extend the below, how would I simulate the r-square? That is, I want
to
run the simulation say 100 times, and then obtain the mean r-square from
each simulation. Thus, I can show, at a specific sample size (n=100) and
number of independent variables (k=5), what the r-square would be just by
chance alone.

As an extension, is there a way to vary the sample size (n from 50 to
1000,
in increments of 50) and the number of independent variables (k=1 to
k=100
in increments of 1) in the simulation?

Best,
J.

____________________________________________________

Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland

Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305

Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en

Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________

On 07.08.2009 12:06, Martin Weiss wrote:

```
```<>
You could also -tokenize- the return from -indeplist- and have your
-program- return the regressors one by one...

*************
capt prog drop sim

version 10.1

program define sim, rclass
drop _all
set obs 100
gen y = invnorm(uniform())
gen x1 = invnorm(uniform())
gen x2 = invnorm(uniform())
gen x3 = invnorm(uniform())
gen x4 = invnorm(uniform())
gen x5 = invnorm(uniform())
stepwise, pr(.2): regress y x1-x5
qui indeplist
tokenize "`r(X)'"
ret loc one="`1'"
ret loc two="`2'"
ret loc three="`3'"
ret loc four="`4'"
ret loc five="`5'"
end

sim

ret li
*************

HTH
Martin

-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von John
Antonakis
Gesendet: Freitag, 7. August 2009 11:47
An: statalist@hsphsun2.harvard.edu
Betreff: st: Simulating stepwise regression

Hi:

I would like to simulate the below. Note, I am no fan of stepwise--I
just
want to demonstrate it evils

However, I do not know

1. what to put in the place of "??"--that is, I want the program to
capture only the variables that were selected in the model as being
significant

2. how to simulate the r-square.

3. how to extend the simulation (a new program) such that I simulate
from
n = 50 to n=1000 (in increments of 50), crossed with independent
variables
ranging from x1 to x100.

Regards,
John.

Here is the program:

set seed 123456

capture program drop sim
version 10.1
program define sim, eclass
drop _all

set obs 100

gen y = invnorm(uniform())
gen x1 = invnorm(uniform())
gen x2 = invnorm(uniform())
gen x3 = invnorm(uniform())
gen x4 = invnorm(uniform())
gen x5 = invnorm(uniform())

stepwise, pr(.2): regress y x1-x5
end

simulate ??? , reps(20) seed (123) : sim,

foreach v in ?? {
gen t_`v' = /*
*/_b_`v'/_se_`v'
gen p_`v' =/*
*/ 2*(1-normal(abs(t_`v')))
}

____________________________________________________

Prof. John Antonakis
Associate Dean Faculty of Business and Economics
University of Lausanne
Internef #618
CH-1015 Lausanne-Dorigny
Switzerland

Tel ++41 (0)21 692-3438
Fax ++41 (0)21 692-3305

Faculty page:
http://www.hec.unil.ch/people/jantonakis&cl=en

Personal page:
http://www.hec.unil.ch/jantonakis
____________________________________________________

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```

```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

```
```

```
```*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```