Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: how to index regressions inside a foreach loop in order to avoid writing over the estimates

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: how to index regressions inside a foreach loop in order to avoid writing over the estimates Date Fri, 14 Oct 2011 10:25:51 +0100

Thanks for your thoughtful replies.

I will only say that your approach outlined under #1 makes good sense to me.
It is the _mixture_ of careful and crude that will need defence.

Nick

On Fri, Oct 14, 2011 at 10:19 AM, hind lazrak <hindstata@gmail.com> wrote:
> Dear Nick
>
> Thank you for your relevant comments. I should always be careful in
> how I word my approach ( perhaps a mix of language barrier and theory
> confusion...)
> I'd like to address a few points that you raised
> #1: The variables that I would like to control for are indeed theory
> guided for the first two  (applying physical principles) and the last
> one
> is more a "common sense" one. I have not examined these variables in
> any statistical way. I will be offering them in the regression in
> spite of any significance test results. I like to think of my approach
> of one that would be on the side of those who do not let the p-values
> decide for their users.
>
> #2: I agree and this is a struggle that I will have to face. Reading
> on correction of multiple testing is a step that I will take (I
> vaguely know of Bonferroni correction). Meanwhile I am curious  to see
> what this exploratory analysis is showing even without correction.
>
> #4 : When I get to the writing phase, this is something that I need to
> keep in mind: striking the right balance between too much explanations
> and results and helping the reader follow the steps.
>
> As for the 3rd comment, I really don't know what to say. I agree, the
> frequentist approach is one of what may seem arbitrary. The way I see
> it is what type of error am I most likely to accept. Here we are
> examining exposure metrics. I'd rather say that the metric is not
> measuring what the true exposure is (type I error) and having a more
> conservative estimates when the time for an epidemiological study
> comes - but this is not the focus of this study for which I was asking
> for help on the coding part.
>
> Best,
> Hind
>
> On Fri, Oct 14, 2011 at 1:44 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> I'd advise strongly against this for several reasons. Here are some of them.
>>
>> 1. This is mixing crude and subtle in a strange way. You have
>> subject-matter (perhaps theory-guided) thinking telling you that some
>> confounders deserve to be in the model, but otherwise it appears that
>> you are going to let significance tests do all the work of deciding
>> what else should be in the model or what is worth thinking about. Many
>> people do that, but many disapprove too.
>>
>> 2. Multiple tests at the same critical level have shifted your real
>> critical level in a way that is difficult to handle. This divides up
>> any field from people who don't care much to those who have a strong
>> belief that not confronting this is a major technical error. The
>> problem goes under different names in different literatures.
>>
>> 3. Your critical level is 0.95 now, was 0.1 in your first posting.
>> Although I guess the mention of 0.95 is just confusing significance
>> level and confidence level, this illustrates a major difficulty with
>> this approach: the threshold is arbitrary. You then have to argue with
>> both those who want a different threshold and those who don't believe
>> you should use just significance tests for your decision-making here.
>>
>> 4. A reviewer of your work is likely to have some favourite
>> variable(s) that they think should be tried out. If your story is
>> going to be "Oh yes, I tried that but it wasn't significant, so it's
>> not in the Table" that is not going to impress. Most reviewers want
>> access to all the results in principle; how much time they spend
>> scanning them is their capricious decision.
>>
>> Note that #4 can bite you even if you discount #1, #2, #3.
>>
>> Nick
>>
>> On Fri, Oct 14, 2011 at 4:46 AM, hind lazrak <hindstata@gmail.com> wrote:
>>
>>> Thank you for taking the time to respond to the question I posted.
>>>
>>> I made the example simpler in my post for more clarity.
>>>
>>> In the first step I ran the pwcorr, sig to capture the list of
>>> variables that I ran in the loop.
>>> In fact the simple linear regression does include three other
>>> variables that may act as either modifier or confounder. So I need to
>>> control for them.
>>>
>>> So this brings me back to the original question. Is there any way to
>>> get a table of coeffs that are statistically significant at the 95%
>>> level?
>>
>> Richard Williams
>>>> At 04:54 PM 10/13/2011, hind lazrak wrote:
>>
>>>>> I am using Stata Version 10 on Windows Vista.
>>>>> The analysis I am conducting is exploratory and involves a long list
>>>>> of independent variables I am testing using simple linear regression.
>>>>> In order to see which variables are "promising" I'd like to find a way
>>>>> to store each model estimate and ideally figure out how to tabulate
>>>>> only those that have a p-value<0.1.
>>>>>
>>>>> The code I used is as follow
>>>>>
>>>>> foreach var of varlist [list of 55 vars] {
>>>>> qui reg y1 `var'   */ first set of regressions looking at Y1
>>>>> eststo model1`var'
>>>>>
>>>>> qui reg y2 `var'  */ second set of regressions looking at Y2
>>>>> eststo model2`var'
>>>>> }
>>>>> estimates table model1`var' model`var', beta not
>>>>>
>>>>> This code is not working because it overwrites all the estimates in
>>>>> each regression and only keeps the last one. Also I did not figure out
>>>>> how to only show those with p-val<0.1
>>>>
>>>> The line
>>>>
>>>> estimates table model1`var' model`var', beta not
>>>>
>>>> should probably be
>>>>
>>>> estimates table model1`var' model2`var', beta not
>>>>
>>>> And, it should come before the }, not afterwards.
>>>>
>>>> This is just a bunch of bivariate regressions, right? Why not something like
>>>>
>>>> pwcorr y1 y2 x1-x55, star(.10)
>>>>
>>>> You could probably also fiddle around with the ereturned results and make
>>>> the estimates table command conditional on one p value or the other being
>>>> significant.
>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/