Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Computing the proportion of significant variables after running numerous regressions

From	George Murray <[email protected]>
To	[email protected]
Subject	Re: st: Computing the proportion of significant variables after running numerous regressions
Date	Tue, 15 May 2012 17:22:12 +1000

Nick -- I ran the code you suggested initially, and once it again it
worked wonderfully (although the model I am using has 3 independent
variables, so I just had to edit it slightly).
And incidentally, the only reason my code had 10 repetitions was to
decrease the waiting time when running your codes! I am using 1000
repetitions in my actual analysis.

Thank you both for your input -- *very* helpful.

On Mon, May 14, 2012 at 8:43 PM, Nick Cox <[email protected]> wrote:
> I think it depends what George wants by way of standard errors. If you run -bootstrap: regress- the effect is not just to add the confidence intervals.
>
> Nick
> [email protected]
>
> Phil Clayton
>
> I don't see the problem Nick - I think your code reports the correct values. -bootstrap- reports the same beta coefficients as -regress- since these are the best (least biased) point estimates, and otherwise the estimates that your code extracts seem to come from the bootstrapping as desired.
>
> I completely agree that 10 repetitions is not enough - my example was only designed to demonstrate the use of -post- - but thanks for pointing it out.
>
> Phil
>
> On 14/05/2012, at 7:15 PM, Nick Cox wrote:
>
>> No, you (and I) need to be more circumspect. After -bootstrap:
>> regress- the results in memory are a mix of results for -bootstrap-
>> and for the last replication of -regress-. So, you need to separate
>> that out in your code.
>>
>> On Mon, May 14, 2012 at 9:52 AM, Nick Cox <[email protected]> wrote:
>>> You seem to be guessing that after -bootstrap: regress- there is a
>>> quantity left in memory called -_ci_bc_cons-. Not so. Also, each
>>> confidence interval is a pair of numbers, so you need to create two
>>> variables to hold it, not one. The trick to these calculations is to
>>> see what is left in memory after a command. By the way, 10
>>> replications would not be enough for most serious work.
>>>
>>> * load dataset
>>>  sysuse auto, clear
>>>
>>>  * set up temporary file for results
>>>  tempfile results
>>>  tempname postfile
>>>  postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg _cons_ll
>>> _cons_ul _b_ll _b_ul using "`results'"
>>>
>>>  * run bootstrapped regression for each level of foreign
>>>  set seed 1 // so that you can repeat your analysis
>>>  levelsof foreign, local(levels)
>>>  foreach level of local levels {
>>>        bootstrap, rep(10): regress price mpg if foreign==`level'
>>>                mat ci = e(ci_bc)
>>>        post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg])
>>> (_se[mpg]) (ci[1,2]) (ci[2,2]) (ci[1,1]) (ci[2,1])
>>>  }
>>>  postclose `postfile'
>>>
>>>  * display results
>>>  use "`results'", clear
>>>  list
>>>
>>>
>>> On Mon, May 14, 2012 at 9:30 AM, George Murray
>>> <[email protected]> wrote:
>>>> Phil,
>>>>
>>>> Thank you so much for your help, this worked perfectly.
>>>>
>>>> I have one more query, however.
>>>>
>>>> I also need a vector of the bias-corrected confidence intervals (which
>>>> can be obtained with the -estat bootstrap- command). I replace two of
>>>> the commands you suggested with these two commands as follows:
>>>>
>>>> -postfile `postfile' foreign _b_cons _se_cons _ci_bc_cons _b_mpg
>>>> _se_mpg using "`results'"- .............(all I did was add
>>>> "_ci_bc_cons")
>>>>
>>>> -post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_ci_bc[_cons])
>>>> (_b[mpg]) (_se[mpg])- .............(all I did was add
>>>> "(_ci_bc[_cons])")
>>>>
>>>> and I also wrote -estat boostrap- after the bootstrap, rep(10)... command
>>>>
>>>> However, I get the following error:
>>>>
>>>> _ci_bc not found
>>>> post:  above message corresponds to expression 3, variable _ci_bc_cons
>>>> r(111);
>>>>
>>>> Does anyone know how to solve this problem?
>>>
>>>
>>> On Mon, May 14, 2012 at 12:05 AM, Phil Clayton
>>>> <[email protected]> wrote:
>>>>> George,
>>>>>
>>>>> There are various ways to do this. One is to use -post- after each bootstrapped regression to store the results of that regression in a "results" dataset, similar to a Monte Carlo simulation. You can then access the results dataset and manipulate it however you like.
>>>>>
>>>>> Here's a basic example that uses the auto dataset and loops over the levels of "foreign" (ie 0 and 1), runs a bootstrapped regression of price on mpg for each level, and displays the resulting coefficients and standard errors.
>>>>>
>>>>> --------- begin example ---------
>>>>> * load dataset
>>>>> sysuse auto, clear
>>>>>
>>>>> * set up temporary file for results
>>>>> tempfile results
>>>>> tempname postfile
>>>>> postfile `postfile' foreign _b_cons _se_cons _b_mpg _se_mpg using "`results'"
>>>>>
>>>>> * run bootstrapped regression for each level of foreign
>>>>> set seed 1 // so that you can repeat your analysis
>>>>> levelsof foreign, local(levels)
>>>>> foreach level of local levels {
>>>>>        bootstrap, rep(10): regress price mpg if foreign==`level'
>>>>>        post `postfile' (`level') (_b[_cons]) (_se[_cons]) (_b[mpg]) (_se[mpg])
>>>>> }
>>>>> postclose `postfile'
>>>>>
>>>>> * display results
>>>>> use "`results'", clear
>>>>> list
>>>>> --------- end example ---------
>>>>>
>>>>> Since you're running ~1000 models you may wish to change "foreach" to "qui foreach", and monitor the iterations using the _dots command (see Harrison DA. Stata tip 41: Monitoring loop iterations. Stata Journal 2007;7(1):140, available at http://www.stata-journal.com/article.html?article=pr0030)
>>>>>
>>>>> Phil
>>>>>
>>>>>
>>>>> On 13/05/2012, at 10:06 PM, George Murray wrote:
>>>>>
>>>>>> Dear Statalist,
>>>>>>
>>>>>> I am using the -foreach- command to run approximately 1000
>>>>>> (bootstrapped) regression models, however I require an efficient way
>>>>>> of calculating the proportion of the regression models which have a
>>>>>> statistically significant constant at the 5% level; and of the
>>>>>> constants which are statistically significant, the proportion which
>>>>>> are positive.  Below each of the 1000 regressions I run, a table is
>>>>>> displayed with the following format:
>>>>>>
>>>>>> ---------------------------------------------------------------------------------------------------
>>>>>>             |    Observed                         Bootstrap
>>>>>>        V0 |       Coef.             Bias         Std. Err.
>>>>>> [95% Conf. Interval]
>>>>>> -------------+------------------------------------------------------------------------------------
>>>>>>         V1 |   .00968169  -.0000537   .00057051     .008721   .0111218  (BC)
>>>>>>         V2 |  -.00110469   .0000782     .000691   -.0023101    .000459  (BC)
>>>>>>         V3 |   .00468313  -.0001562   .00084971    .0031954   .0064538  (BC)
>>>>>>         _cons |  -.00076976   .0001811   .00176677   -.0044496   .0025584  (BC)
>>>>>> --------------------------------------------------------------------------------------------------
>>>>>>
>>>>>> I would be *very* grateful if someone knew the commands which would
>>>>>> allow me calculate this. In the past, I have used (a highly tedious
>>>>>> and embarrassing approach on) Excel where I filtered every Nth row,
>>>>>> and wrote a command to display 1 if the coefficient lies within the
>>>>>> confidence interval, and 0 if not. This time, however, I am running
>>>>>> numerous models and require a quicker approach.
>>>>>>
>>>>>> One more question -- is there a way to create a new variable where the
>>>>>> coefficients of V1 (for example) are saved, so I can calculate the
>>>>>> mean, standard deviation etc.of V1?
>>>>>>
>>>>>> If someone could answer at least one of these two questions, it would
>>>>>> be very much appreciated.
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Computing the proportion of significant variables after running numerous regressions
  - From: George Murray <[email protected]>
- Re: st: Computing the proportion of significant variables after running numerous regressions
  - From: Phil Clayton <[email protected]>
- Re: st: Computing the proportion of significant variables after running numerous regressions
  - From: George Murray <[email protected]>
- Re: st: Computing the proportion of significant variables after running numerous regressions
  - From: Nick Cox <[email protected]>
- Re: st: Computing the proportion of significant variables after running numerous regressions
  - From: Nick Cox <[email protected]>
- Re: st: Computing the proportion of significant variables after running numerous regressions
  - From: Phil Clayton <[email protected]>
- RE: st: Computing the proportion of significant variables after running numerous regressions
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: SE and CI by mrtab
Next by Date: st: plot a normal distribution using stata
Previous by thread: RE: st: Computing the proportion of significant variables after running numerous regressions
Next by thread: st: Creating an index
Index(es):
- Date
- Thread