Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: Simulating stepwise regression


From   Tirthankar Chakravarty <tirthankar.chakravarty@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: AW: Simulating stepwise regression
Date   Fri, 7 Aug 2009 18:31:59 +0100

Here is a way; maybe someone can suggest something which is not quite
this cumbersome. Note that since sort order is not important, you can
merge by an identifier based on row number.

An "id" variable is created in each simulation file. Once the
simulations are done, the files are merged. Then a -reshape- and a
-collapse- gets you where you want. I am assuming you want the mean
R^2 for each set of simulations. This can be changed.
****************************************
capture program drop sim
version 10
program define sim, rclass
	drop _all
	syntax , nreg(integer ) nobs(integer )
	set obs `nobs'
	forv i=1/`nreg' {
		g x`i' = invnormal(uniform())
	}
	gen y = invnorm(uniform())
	stepwise, pr(.2): regress y x*
      return scalar r2d2 = e(r2)
end

foreach nobs of numlist 1000 1500 2000 {
	forv nreg = 1(1)10 {
		simulate r2d2=r(r2d2), reps(10000) ///
		 saving(sw_r2_`nobs'_`nreg'.dta, every(1) ///
		  replace) seed(123): sim, nreg(`nreg') ///
		   nobs(`nobs')
		use sw_r2_`nobs'_`nreg'.dta
		g id=_n
		rename r2d2 r2_`nobs'_`nreg'
		sort id
		save sw_r2_`nobs'_`nreg'.dta, replace
	}
}

/* presenting the results */
// merge the files; the last simulation is the
// file in memory.
foreach nobs of numlist 1000 1500 2000 {
	forv nreg = 1(1)10 {
		if "`c(filename)'" != "sw_r2_`nobs'_`nreg'.dta" {
		// this checks if the last filename has been reached
			merge id using sw_r2_`nobs'_`nreg', ///
			 _merge(identifier_`nobs'_`nreg') unique
			sort id
		}
		else {
			di in g "All done merging."
		}
	}
}
drop identifier*
// reshape the data to be cross-classified by no. of
// regressors and no. of observations
reshape long r2_1000_ r2_2000_ r2_1500_, i(id) j(numreg)
// get the means of the R^2 for each simulation
collapse (mean) r2*, by(numreg)
list, noobs
save simulations_merged_collapsed, replace
****************************************

T

On Fri, Aug 7, 2009 at 5:44 PM, John Antonakis<john.antonakis@unil.ch> wrote:
> Thanks Tirthankar!
>
> I see that separate files are stored for each simulation. How could one
> combine those results in one file?
>
> Also, how would one generate a table (sample size on the horizontal and
> number of predictors on the vertical) with the simulated r-squares?
> Best,
> J.
>
> ____________________________________________________
>
> Prof. John Antonakis
> Associate Dean Faculty of Business and Economics
> University of Lausanne
> Internef #618
> CH-1015 Lausanne-Dorigny
> Switzerland
>
> Tel ++41 (0)21 692-3438
> Fax ++41 (0)21 692-3305
>
> Faculty page:
> http://www.hec.unil.ch/people/jantonakis&cl=en
>
> Personal page:
> http://www.hec.unil.ch/jantonakis
> ____________________________________________________
>
>
>
> On 07.08.2009 13:10, Tirthankar Chakravarty wrote:
>>
>> You should probably use -simulate-. Here is what it might look like:
>>
>> ***********************************
>> capture program drop sim
>> version 10
>> program define sim, rclass
>>        drop _all
>>        syntax , nreg(integer ) nobs(integer )
>>        set obs `nobs'
>>        forv i=1/`nreg' {
>>                g x`i' = invnormal(uniform())
>>        }
>>        gen y = invnorm(uniform())
>>        stepwise, pr(.2): regress y x*
>>      qui indeplist
>>      return scalar r2d2 = e(r2)
>> end
>>
>> /*
>> simulate for each of the regressor and
>> sample size combinations required.
>> 10,000 replications.
>> */
>> foreach nobs of numlist 1000 1500 2000 {
>>        forv nreg = 1(1)10 {
>>                simulate r2d2=r(r2d2), reps(10000) ///
>>                 saving(sw_r2_`nobs'_`nreg'.dta, every(1) ///
>>                  replace) seed(123): sim, nreg(`nreg') ///
>>                   nobs(`nobs')
>>        }
>> }
>> use sw_r2_1000_5, clear
>> kdensity r2d2
>> ***********************************************
>>
>> On Fri, Aug 7, 2009 at 11:18 AM, John Antonakis<john.antonakis@unil.ch>
>> wrote:
>>
>>>
>>> That's very helpful; thanks Martin.
>>>
>>> To extend the below, how would I simulate the r-square? That is, I want
>>> to
>>> run the simulation say 100 times, and then obtain the mean r-square from
>>> each simulation. Thus, I can show, at a specific sample size (n=100) and
>>> number of independent variables (k=5), what the r-square would be just by
>>> chance alone.
>>>
>>> As an extension, is there a way to vary the sample size (n from 50 to
>>> 1000,
>>> in increments of 50) and the number of independent variables (k=1 to
>>> k=100
>>> in increments of 1) in the simulation?
>>>
>>> Best,
>>> J.
>>>
>>> ____________________________________________________
>>>
>>> Prof. John Antonakis
>>> Associate Dean Faculty of Business and Economics
>>> University of Lausanne
>>> Internef #618
>>> CH-1015 Lausanne-Dorigny
>>> Switzerland
>>>
>>> Tel ++41 (0)21 692-3438
>>> Fax ++41 (0)21 692-3305
>>>
>>> Faculty page:
>>> http://www.hec.unil.ch/people/jantonakis&cl=en
>>>
>>> Personal page:
>>> http://www.hec.unil.ch/jantonakis
>>> ____________________________________________________
>>>
>>>
>>>
>>> On 07.08.2009 12:06, Martin Weiss wrote:
>>>
>>>>
>>>> <>
>>>> You could also -tokenize- the return from -indeplist- and have your
>>>> -program- return the regressors one by one...
>>>>
>>>>
>>>> *************
>>>> capt prog drop sim
>>>>
>>>> version 10.1
>>>>
>>>> program define sim, rclass
>>>>  drop _all
>>>>       set obs 100
>>>>       gen y = invnorm(uniform())
>>>>       gen x1 = invnorm(uniform())
>>>>       gen x2 = invnorm(uniform())
>>>>       gen x3 = invnorm(uniform())
>>>>       gen x4 = invnorm(uniform())
>>>>       gen x5 = invnorm(uniform())
>>>>       stepwise, pr(.2): regress y x1-x5
>>>>       qui indeplist
>>>>       tokenize "`r(X)'"
>>>>       ret loc one="`1'"
>>>>       ret loc two="`2'"
>>>>       ret loc three="`3'"
>>>>       ret loc four="`4'"
>>>>       ret loc five="`5'"
>>>> end
>>>>
>>>> sim
>>>>
>>>> ret li
>>>> *************
>>>>
>>>>
>>>>
>>>> HTH
>>>> Martin
>>>>
>>>>
>>>> -----Ursprüngliche Nachricht-----
>>>> Von: owner-statalist@hsphsun2.harvard.edu
>>>> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von John
>>>> Antonakis
>>>> Gesendet: Freitag, 7. August 2009 11:47
>>>> An: statalist@hsphsun2.harvard.edu
>>>> Betreff: st: Simulating stepwise regression
>>>>
>>>> Hi:
>>>>
>>>> I would like to simulate the below. Note, I am no fan of stepwise--I
>>>> just
>>>> want to demonstrate it evils
>>>>
>>>> However, I do not know
>>>>
>>>> 1. what to put in the place of "??"--that is, I want the program to
>>>> capture only the variables that were selected in the model as being
>>>> significant
>>>>
>>>> 2. how to simulate the r-square.
>>>>
>>>> 3. how to extend the simulation (a new program) such that I simulate
>>>> from
>>>> n = 50 to n=1000 (in increments of 50), crossed with independent
>>>> variables
>>>> ranging from x1 to x100.
>>>>
>>>> Regards,
>>>> John.
>>>>
>>>> Here is the program:
>>>>
>>>> set seed 123456
>>>>
>>>> capture program drop sim
>>>>  version 10.1
>>>> program define sim, eclass
>>>>       drop _all
>>>>
>>>> set obs 100
>>>>
>>>> gen y = invnorm(uniform())
>>>> gen x1 = invnorm(uniform())
>>>> gen x2 = invnorm(uniform())
>>>> gen x3 = invnorm(uniform())
>>>> gen x4 = invnorm(uniform())
>>>> gen x5 = invnorm(uniform())
>>>>
>>>> stepwise, pr(.2): regress y x1-x5
>>>>  end
>>>>
>>>> simulate ??? , reps(20) seed (123) : sim,
>>>>
>>>> foreach v in ?? {
>>>>  gen t_`v' = /*
>>>> */_b_`v'/_se_`v'
>>>>  gen p_`v' =/*
>>>> */ 2*(1-normal(abs(t_`v')))
>>>> }
>>>>
>>>> ____________________________________________________
>>>>
>>>> Prof. John Antonakis
>>>> Associate Dean Faculty of Business and Economics
>>>> University of Lausanne
>>>> Internef #618
>>>> CH-1015 Lausanne-Dorigny
>>>> Switzerland
>>>>
>>>> Tel ++41 (0)21 692-3438
>>>> Fax ++41 (0)21 692-3305
>>>>
>>>> Faculty page:
>>>> http://www.hec.unil.ch/people/jantonakis&cl=en
>>>>
>>>> Personal page:
>>>> http://www.hec.unil.ch/jantonakis
>>>> ____________________________________________________
>>>>
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/statalist/faq
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>>
>>
>>
>>
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>



-- 
To every ω-consistent recursive class κ of formulae there correspond
recursive class signs r, such that neither v Gen r nor Neg(v Gen r)
belongs to Flg(κ) (where v is the free variable of r).

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index