Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Computing the proportion of significant variables after running numerous regressions

From   Phil Clayton <[email protected]>
To   [email protected]
Subject   Re: st: Computing the proportion of significant variables after running numerous regressions
Date   Mon, 14 May 2012 00:05:49 +1000


There are various ways to do this. One is to use -post- after each bootstrapped regression to store the results of that regression in a "results" dataset, similar to a Monte Carlo simulation. You can then access the results dataset and manipulate it however you like.

Here's a basic example that uses the auto dataset and loops over the levels of "foreign" (ie 0 and 1), runs a bootstrapped regression of price on mpg for each level, and displays the resulting coefficients and standard errors.

--------- begin example ---------
* load dataset
sysuse auto, clear

* set up temporary file for results
tempfile results
tempname postfile
postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg using "`results'"

* run bootstrapped regression for each level of foreign
set seed 1 // so that you can repeat your analysis
levelsof foreign, local(levels)
foreach level of local levels {
	bootstrap, rep(10): regress price mpg if foreign==`level'
	post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg]) (_se[mpg])
postclose `postfile'

* display results
use "`results'", clear
--------- end example ---------

Since you're running ~1000 models you may wish to change "foreach" to "qui foreach", and monitor the iterations using the _dots command (see Harrison DA. Stata tip 41: Monitoring loop iterations. Stata Journal 2007;7(1):140, available at


On 13/05/2012, at 10:06 PM, George Murray wrote:

> Dear Statalist,
> I am using the -foreach- command to run approximately 1000
> (bootstrapped) regression models, however I require an efficient way
> of calculating the proportion of the regression models which have a
> statistically significant constant at the 5% level; and of the
> constants which are statistically significant, the proportion which
> are positive.  Below each of the 1000 regressions I run, a table is
> displayed with the following format:
> ---------------------------------------------------------------------------------------------------
>             |    Observed                         Bootstrap
>        V0 |       Coef.             Bias         Std. Err.
> [95% Conf. Interval]
> -------------+------------------------------------------------------------------------------------
>         V1 |   .00968169  -.0000537   .00057051     .008721   .0111218  (BC)
>         V2 |  -.00110469   .0000782     .000691   -.0023101    .000459  (BC)
>         V3 |   .00468313  -.0001562   .00084971    .0031954   .0064538  (BC)
>         _cons |  -.00076976   .0001811   .00176677   -.0044496   .0025584  (BC)
> --------------------------------------------------------------------------------------------------
> I would be *very* grateful if someone knew the commands which would
> allow me calculate this. In the past, I have used (a highly tedious
> and embarrassing approach on) Excel where I filtered every Nth row,
> and wrote a command to display 1 if the coefficient lies within the
> confidence interval, and 0 if not. This time, however, I am running
> numerous models and require a quicker approach.
> One more question -- is there a way to create a new variable where the
> coefficients of V1 (for example) are saved, so I can calculate the
> mean, standard deviation etc.of V1?
> If someone could answer at least one of these two questions, it would
> be very much appreciated.
> George Murray.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index