George Murray <george.murray16@gmail.com>

statalist@hsphsun2.harvard.edu

Re: st: Computing the proportion of significant variables after running numerous regressions

Tue, 15 May 2012 17:22:12 +1000

Nick -- I ran the code you suggested initially, and once it again it worked wonderfully (although the model I am using has 3 independent variables, so I just had to edit it slightly). And incidentally, the only reason my code had 10 repetitions was to decrease the waiting time when running your codes! I am using 1000 repetitions in my actual analysis. Thank you both for your input -- *very* helpful. On Mon, May 14, 2012 at 8:43 PM, Nick Cox <n.j.cox@durham.ac.uk> wrote: > I think it depends what George wants by way of standard errors. If you run -bootstrap: regress- the effect is not just to add the confidence intervals. > > Nick > n.j.cox@durham.ac.uk > > Phil Clayton > > I don't see the problem Nick - I think your code reports the correct values. -bootstrap- reports the same beta coefficients as -regress- since these are the best (least biased) point estimates, and otherwise the estimates that your code extracts seem to come from the bootstrapping as desired. > > I completely agree that 10 repetitions is not enough - my example was only designed to demonstrate the use of -post- - but thanks for pointing it out. > > Phil > > On 14/05/2012, at 7:15 PM, Nick Cox wrote: > >> No, you (and I) need to be more circumspect. After -bootstrap: >> regress- the results in memory are a mix of results for -bootstrap- >> and for the last replication of -regress-. So, you need to separate >> that out in your code. >> >> On Mon, May 14, 2012 at 9:52 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>> You seem to be guessing that after -bootstrap: regress- there is a >>> quantity left in memory called -_ci_bc_cons-. Not so. Also, each >>> confidence interval is a pair of numbers, so you need to create two >>> variables to hold it, not one. The trick to these calculations is to >>> see what is left in memory after a command. By the way, 10 >>> replications would not be enough for most serious work. >>> >>> * load dataset >>> sysuse auto, clear >>> >>> * set up temporary file for results >>> tempfile results >>> tempname postfile >>> postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg _cons_ll >>> _cons_ul _b_ll _b_ul using "`results'" >>> >>> * run bootstrapped regression for each level of foreign >>> set seed 1 // so that you can repeat your analysis >>> levelsof foreign, local(levels) >>> foreach level of local levels { >>> bootstrap, rep(10): regress price mpg if foreign==`level' >>> mat ci = e(ci_bc) >>> post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg]) >>> (_se[mpg]) (ci[1,2]) (ci[2,2]) (ci[1,1]) (ci[2,1]) >>> } >>> postclose `postfile' >>> >>> * display results >>> use "`results'", clear >>> list >>> >>> >>> On Mon, May 14, 2012 at 9:30 AM, George Murray >>> <george.murray16@gmail.com> wrote: >>>> Phil, >>>> >>>> Thank you so much for your help, this worked perfectly. >>>> >>>> I have one more query, however. >>>> >>>> I also need a vector of the bias-corrected confidence intervals (which >>>> can be obtained with the -estat bootstrap- command). I replace two of >>>> the commands you suggested with these two commands as follows: >>>> >>>> -postfile `postfile' foreign _b_cons _se_cons _ci_bc_cons _b_mpg >>>> _se_mpg using "`results'"- .............(all I did was add >>>> "_ci_bc_cons") >>>> >>>> -post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_ci_bc[_cons]) >>>> (_b[mpg]) (_se[mpg])- .............(all I did was add >>>> "(_ci_bc[_cons])") >>>> >>>> and I also wrote -estat boostrap- after the bootstrap, rep(10)... command >>>> >>>> However, I get the following error: >>>> >>>> _ci_bc not found >>>> post: above message corresponds to expression 3, variable _ci_bc_cons >>>> r(111); >>>> >>>> Does anyone know how to solve this problem? >>> >>> >>> On Mon, May 14, 2012 at 12:05 AM, Phil Clayton >>>> <philclayton@internode.on.net> wrote: >>>>> George, >>>>> >>>>> There are various ways to do this. One is to use -post- after each bootstrapped regression to store the results of that regression in a "results" dataset, similar to a Monte Carlo simulation. You can then access the results dataset and manipulate it however you like. >>>>> >>>>> Here's a basic example that uses the auto dataset and loops over the levels of "foreign" (ie 0 and 1), runs a bootstrapped regression of price on mpg for each level, and displays the resulting coefficients and standard errors. >>>>> >>>>> --------- begin example --------- >>>>> * load dataset >>>>> sysuse auto, clear >>>>> >>>>> * set up temporary file for results >>>>> tempfile results >>>>> tempname postfile >>>>> postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg using "`results'" >>>>> >>>>> * run bootstrapped regression for each level of foreign >>>>> set seed 1 // so that you can repeat your analysis >>>>> levelsof foreign, local(levels) >>>>> foreach level of local levels { >>>>> bootstrap, rep(10): regress price mpg if foreign==`level' >>>>> post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg]) (_se[mpg]) >>>>> } >>>>> postclose `postfile' >>>>> >>>>> * display results >>>>> use "`results'", clear >>>>> list >>>>> --------- end example --------- >>>>> >>>>> Since you're running ~1000 models you may wish to change "foreach" to "qui foreach", and monitor the iterations using the _dots command (see Harrison DA. Stata tip 41: Monitoring loop iterations. Stata Journal 2007;7(1):140, available at http://www.stata-journal.com/article.html?article=pr0030) >>>>> >>>>> Phil >>>>> >>>>> >>>>> On 13/05/2012, at 10:06 PM, George Murray wrote: >>>>> >>>>>> Dear Statalist, >>>>>> >>>>>> I am using the -foreach- command to run approximately 1000 >>>>>> (bootstrapped) regression models, however I require an efficient way >>>>>> of calculating the proportion of the regression models which have a >>>>>> statistically significant constant at the 5% level; and of the >>>>>> constants which are statistically significant, the proportion which >>>>>> are positive. Below each of the 1000 regressions I run, a table is >>>>>> displayed with the following format: >>>>>> >>>>>> --------------------------------------------------------------------------------------------------- >>>>>> | Observed Bootstrap >>>>>> V0 | Coef. Bias Std. Err. >>>>>> [95% Conf. Interval] >>>>>> -------------+------------------------------------------------------------------------------------ >>>>>> V1 | .00968169 -.0000537 .00057051 .008721 .0111218 (BC) >>>>>> V2 | -.00110469 .0000782 .000691 -.0023101 .000459 (BC) >>>>>> V3 | .00468313 -.0001562 .00084971 .0031954 .0064538 (BC) >>>>>> _cons | -.00076976 .0001811 .00176677 -.0044496 .0025584 (BC) >>>>>> -------------------------------------------------------------------------------------------------- >>>>>> >>>>>> I would be *very* grateful if someone knew the commands which would >>>>>> allow me calculate this. In the past, I have used (a highly tedious >>>>>> and embarrassing approach on) Excel where I filtered every Nth row, >>>>>> and wrote a command to display 1 if the coefficient lies within the >>>>>> confidence interval, and 0 if not. This time, however, I am running >>>>>> numerous models and require a quicker approach. >>>>>> >>>>>> One more question -- is there a way to create a new variable where the >>>>>> coefficients of V1 (for example) are saved, so I can calculate the >>>>>> mean, standard deviation etc.of V1? >>>>>> >>>>>> If someone could answer at least one of these two questions, it would >>>>>> be very much appreciated. > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

