Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Computing the proportion of significant variables after running numerous regressions


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Computing the proportion of significant variables after running numerous regressions
Date   Mon, 14 May 2012 10:15:52 +0100

No, you (and I) need to be more circumspect. After -bootstrap:
regress- the results in memory are a mix of results for -bootstrap-
and for the last replication of -regress-. So, you need to separate
that out in your code.

On Mon, May 14, 2012 at 9:52 AM, Nick Cox <njcoxstata@gmail.com> wrote:
> You seem to be guessing that after -bootstrap: regress- there is a
> quantity left in memory called -_ci_bc_cons-. Not so. Also, each
> confidence interval is a pair of numbers, so you need to create two
> variables to hold it, not one. The trick to these calculations is to
> see what is left in memory after a command. By the way, 10
> replications would not be enough for most serious work.
>
> * load dataset
>  sysuse auto, clear
>
>  * set up temporary file for results
>  tempfile results
>  tempname postfile
>  postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg _cons_ll
> _cons_ul _b_ll _b_ul using "`results'"
>
>  * run bootstrapped regression for each level of foreign
>  set seed 1 // so that you can repeat your analysis
>  levelsof foreign, local(levels)
>  foreach level of local levels {
>        bootstrap, rep(10): regress price mpg if foreign==`level'
>                mat ci = e(ci_bc)
>        post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg])
> (_se[mpg]) (ci[1,2]) (ci[2,2]) (ci[1,1]) (ci[2,1])
>  }
>  postclose `postfile'
>
>  * display results
>  use "`results'", clear
>  list
>
>
> On Mon, May 14, 2012 at 9:30 AM, George Murray
> <george.murray16@gmail.com> wrote:
>> Phil,
>>
>> Thank you so much for your help, this worked perfectly.
>>
>> I have one more query, however.
>>
>> I also need a vector of the bias-corrected confidence intervals (which
>> can be obtained with the -estat bootstrap- command). I replace two of
>> the commands you suggested with these two commands as follows:
>>
>> -postfile `postfile' foreign _b_cons _se_cons _ci_bc_cons _b_mpg
>> _se_mpg using "`results'"- .............(all I did was add
>> "_ci_bc_cons")
>>
>> -post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_ci_bc[_cons])
>> (_b[mpg]) (_se[mpg])- .............(all I did was add
>> "(_ci_bc[_cons])")
>>
>> and I also wrote -estat boostrap- after the bootstrap, rep(10)... command
>>
>> However, I get the following error:
>>
>> _ci_bc not found
>> post:  above message corresponds to expression 3, variable _ci_bc_cons
>> r(111);
>>
>> Does anyone know how to solve this problem?
>
>
> On Mon, May 14, 2012 at 12:05 AM, Phil Clayton
>> <philclayton@internode.on.net> wrote:
>>> George,
>>>
>>> There are various ways to do this. One is to use -post- after each bootstrapped regression to store the results of that regression in a "results" dataset, similar to a Monte Carlo simulation. You can then access the results dataset and manipulate it however you like.
>>>
>>> Here's a basic example that uses the auto dataset and loops over the levels of "foreign" (ie 0 and 1), runs a bootstrapped regression of price on mpg for each level, and displays the resulting coefficients and standard errors.
>>>
>>> --------- begin example ---------
>>> * load dataset
>>> sysuse auto, clear
>>>
>>> * set up temporary file for results
>>> tempfile results
>>> tempname postfile
>>> postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg using "`results'"
>>>
>>> * run bootstrapped regression for each level of foreign
>>> set seed 1 // so that you can repeat your analysis
>>> levelsof foreign, local(levels)
>>> foreach level of local levels {
>>>        bootstrap, rep(10): regress price mpg if foreign==`level'
>>>        post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg]) (_se[mpg])
>>> }
>>> postclose `postfile'
>>>
>>> * display results
>>> use "`results'", clear
>>> list
>>> --------- end example ---------
>>>
>>> Since you're running ~1000 models you may wish to change "foreach" to "qui foreach", and monitor the iterations using the _dots command (see Harrison DA. Stata tip 41: Monitoring loop iterations. Stata Journal 2007;7(1):140, available at http://www.stata-journal.com/article.html?article=pr0030)
>>>
>>> Phil
>>>
>>>
>>> On 13/05/2012, at 10:06 PM, George Murray wrote:
>>>
>>>> Dear Statalist,
>>>>
>>>> I am using the -foreach- command to run approximately 1000
>>>> (bootstrapped) regression models, however I require an efficient way
>>>> of calculating the proportion of the regression models which have a
>>>> statistically significant constant at the 5% level; and of the
>>>> constants which are statistically significant, the proportion which
>>>> are positive.  Below each of the 1000 regressions I run, a table is
>>>> displayed with the following format:
>>>>
>>>> ---------------------------------------------------------------------------------------------------
>>>>             |    Observed                         Bootstrap
>>>>        V0 |       Coef.             Bias         Std. Err.
>>>> [95% Conf. Interval]
>>>> -------------+------------------------------------------------------------------------------------
>>>>         V1 |   .00968169  -.0000537   .00057051     .008721   .0111218  (BC)
>>>>         V2 |  -.00110469   .0000782     .000691   -.0023101    .000459  (BC)
>>>>         V3 |   .00468313  -.0001562   .00084971    .0031954   .0064538  (BC)
>>>>         _cons |  -.00076976   .0001811   .00176677   -.0044496   .0025584  (BC)
>>>> --------------------------------------------------------------------------------------------------
>>>>
>>>> I would be *very* grateful if someone knew the commands which would
>>>> allow me calculate this. In the past, I have used (a highly tedious
>>>> and embarrassing approach on) Excel where I filtered every Nth row,
>>>> and wrote a command to display 1 if the coefficient lies within the
>>>> confidence interval, and 0 if not. This time, however, I am running
>>>> numerous models and require a quicker approach.
>>>>
>>>> One more question -- is there a way to create a new variable where the
>>>> coefficients of V1 (for example) are saved, so I can calculate the
>>>> mean, standard deviation etc.of V1?
>>>>
>>>> If someone could answer at least one of these two questions, it would
>>>> be very much appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index