Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: how to index regressions inside a foreach loop in order to avoid writing over the estimates


From   "Jesper Lindhardsen" <JESLIN01@geh.regionh.dk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: how to index regressions inside a foreach loop in order to avoid writing over the estimates
Date   Fri, 14 Oct 2011 11:11:47 +0200

Hi There,

I must say I agree with Nick on the reasons for doing this, unless it is just a way to "explore" data.
But for what its worth, try (and expand on) the code below. It is a crude way, but it gets you there.
It requires the -estout- userwritten package.



clear all

sysuse auto

est clear

foreach var of varlist price - trunk  {
	
	qui reg weight length `var'

	qui estout, c(p) keep(`var')

	qui matrix A=r(coefs)
	
	if A[1,1]<0.1 {

		di "`var'   p=" A[1,1]
				
		}

}


HTH, 

Jesper



-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: 14 October 2011 10:45
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: how to index regressions inside a foreach loop in order to avoid writing over the estimates

I'd advise strongly against this for several reasons. Here are some of them.

1. This is mixing crude and subtle in a strange way. You have
subject-matter (perhaps theory-guided) thinking telling you that some
confounders deserve to be in the model, but otherwise it appears that
you are going to let significance tests do all the work of deciding
what else should be in the model or what is worth thinking about. Many
people do that, but many disapprove too.

2. Multiple tests at the same critical level have shifted your real
critical level in a way that is difficult to handle. This divides up
any field from people who don't care much to those who have a strong
belief that not confronting this is a major technical error. The
problem goes under different names in different literatures.

3. Your critical level is 0.95 now, was 0.1 in your first posting.
Although I guess the mention of 0.95 is just confusing significance
level and confidence level, this illustrates a major difficulty with
this approach: the threshold is arbitrary. You then have to argue with
both those who want a different threshold and those who don't believe
you should use just significance tests for your decision-making here.

4. A reviewer of your work is likely to have some favourite
variable(s) that they think should be tried out. If your story is
going to be "Oh yes, I tried that but it wasn't significant, so it's
not in the Table" that is not going to impress. Most reviewers want
access to all the results in principle; how much time they spend
scanning them is their capricious decision.

Note that #4 can bite you even if you discount #1, #2, #3.

Nick

On Fri, Oct 14, 2011 at 4:46 AM, hind lazrak <hindstata@gmail.com> wrote:

> Thank you for taking the time to respond to the question I posted.
>
> I made the example simpler in my post for more clarity.
>
> In the first step I ran the pwcorr, sig to capture the list of
> variables that I ran in the loop.
> In fact the simple linear regression does include three other
> variables that may act as either modifier or confounder. So I need to
> control for them.
>
> So this brings me back to the original question. Is there any way to
> get a table of coeffs that are statistically significant at the 95%
> level?

Richard Williams
>> At 04:54 PM 10/13/2011, hind lazrak wrote:

>>> I am using Stata Version 10 on Windows Vista.
>>> The analysis I am conducting is exploratory and involves a long list
>>> of independent variables I am testing using simple linear regression.
>>> In order to see which variables are "promising" I'd like to find a way
>>> to store each model estimate and ideally figure out how to tabulate
>>> only those that have a p-value<0.1.
>>>
>>> The code I used is as follow
>>>
>>> foreach var of varlist [list of 55 vars] {
>>> qui reg y1 `var'   */ first set of regressions looking at Y1
>>> eststo model1`var'
>>>
>>> qui reg y2 `var'  */ second set of regressions looking at Y2
>>> eststo model2`var'
>>> }
>>> estimates table model1`var' model`var', beta not
>>>
>>> This code is not working because it overwrites all the estimates in
>>> each regression and only keeps the last one. Also I did not figure out
>>> how to only show those with p-val<0.1
>>
>> The line
>>
>> estimates table model1`var' model`var', beta not
>>
>> should probably be
>>
>> estimates table model1`var' model2`var', beta not
>>
>> And, it should come before the }, not afterwards.
>>
>> This is just a bunch of bivariate regressions, right? Why not something like
>>
>> pwcorr y1 y2 x1-x55, star(.10)
>>
>> You could probably also fiddle around with the ereturned results and make
>> the estimates table command conditional on one p value or the other being
>> significant.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index