** run the commands exhibited in: ** resample: A command for randomly resampling a data set ** but using StataQuest, rather than Stata. ** NOTE: In Example 2, StataQuest creates a bootstrap distribution that ** differs just slightly from that created in Stata Version 5.0 (even ** with the command -version 4.0- in effect). Apparently, the command ** -sqswreg- in StataQuest is not quite identical to the command -sw reg- ** in Stata 5.0. The results shown in resample.tex are from Stata 5.0, ** not from StataQuest. ** John R. Gleason ** Syracuse University ** 430 Huntington Hall ** Syracuse, NY 13244-2340 ** 73241.717@compuserve.com ** 10Mar97 * Example 1: * load the data set law_sch.dta: more use law_sch, replace describe * correlate lsat and gpa: more corr lsat gpa * generate a random sample with replacement: more resample lsat gpa more describe * correlate lsat_ and gpa_: more corr lsat_ gpa_ * now, a complete bootstrap simulation. First, some preliminaries: more set seed 970211 set obs 600 gen r_boot = . * then, the slow part: more loop, c(resample lsat gpa in 1/15; corr lsat_ gpa_ in 1/15; replace r_boot = _result\[4\] in I_) i(600) * here's the bootstrap SE of r: more summ r_boot more * Example 2: * load the data set mammal42.dta: more use mammal42, replace describe * consider a stepwise regression of y = pdox_sl on 8 carriers (X's): more sqswreg pdox_sl lbod lbr lgest swav life is_p expo ind, forward pr(.4) pe(.2) * set up for a bootstrap test; first, the commands to be repeated: more global LOOP_CMD "resample pdox_sl in 1/42; sqswreg pdox_sl_ lbod lbr lgest swav life is_p expo ind in 1/42, forward pr(.4) pe(.2); replace r2 = _result(8) in I_; reg pdox_sl_ lbod lbr lgest swav life is_p expo ind in 1/42; replace R2 = _result(8) in I_" * and then some more preliminaries: more gen r2 = . gen R2 = . set seed 970222 set obs 100 * Finally, the actual simulation: more loop, i(100) * Here's the adjusted R^2 following stepwise selection: more summ r2 * Stepwise selection eliminates all X's about 1/3 of the time: more count if r2 == 0 * In the other 2/3 of the cases, it's generally larger than the adjusted R^2 * based on all 8 X's: more reg r2 R2 if r2 more * Here is Figure 1: set textsize 150 * left panel of Figure 1: more gr R2, bin(50) xlab ylab freq bor gap(4) xline(.64) /* */ b2("Adjusted R-squared without variable selection") /* */ /* saving(fig1a, replace) */ * right panel of Figure 1: more gr r2, bin(50) xlab(-.2,0,.2,.4,.6) ylab freq bor gap(4) xline(.64) /* */ b2("Adjusted R-squared after stepwise selection") /* */ /* saving(fig1b, replace) */ * and, the 2 panels combined: more gr using fig1a fig1b, margin(7) /* saving(fig1, replace) */