Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: how to index regressions inside a foreach loop in order to avoid writing over the estimates |
Date | Fri, 14 Oct 2011 10:25:51 +0100 |
Thanks for your thoughtful replies. I will only say that your approach outlined under #1 makes good sense to me. It is the _mixture_ of careful and crude that will need defence. Nick On Fri, Oct 14, 2011 at 10:19 AM, hind lazrak <hindstata@gmail.com> wrote: > Dear Nick > > Thank you for your relevant comments. I should always be careful in > how I word my approach ( perhaps a mix of language barrier and theory > confusion...) > I'd like to address a few points that you raised > #1: The variables that I would like to control for are indeed theory > guided for the first two (applying physical principles) and the last > one > is more a "common sense" one. I have not examined these variables in > any statistical way. I will be offering them in the regression in > spite of any significance test results. I like to think of my approach > of one that would be on the side of those who do not let the p-values > decide for their users. > > #2: I agree and this is a struggle that I will have to face. Reading > on correction of multiple testing is a step that I will take (I > vaguely know of Bonferroni correction). Meanwhile I am curious to see > what this exploratory analysis is showing even without correction. > > #4 : When I get to the writing phase, this is something that I need to > keep in mind: striking the right balance between too much explanations > and results and helping the reader follow the steps. > > As for the 3rd comment, I really don't know what to say. I agree, the > frequentist approach is one of what may seem arbitrary. The way I see > it is what type of error am I most likely to accept. Here we are > examining exposure metrics. I'd rather say that the metric is not > measuring what the true exposure is (type I error) and having a more > conservative estimates when the time for an epidemiological study > comes - but this is not the focus of this study for which I was asking > for help on the coding part. > > Best, > Hind > > On Fri, Oct 14, 2011 at 1:44 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> I'd advise strongly against this for several reasons. Here are some of them. >> >> 1. This is mixing crude and subtle in a strange way. You have >> subject-matter (perhaps theory-guided) thinking telling you that some >> confounders deserve to be in the model, but otherwise it appears that >> you are going to let significance tests do all the work of deciding >> what else should be in the model or what is worth thinking about. Many >> people do that, but many disapprove too. >> >> 2. Multiple tests at the same critical level have shifted your real >> critical level in a way that is difficult to handle. This divides up >> any field from people who don't care much to those who have a strong >> belief that not confronting this is a major technical error. The >> problem goes under different names in different literatures. >> >> 3. Your critical level is 0.95 now, was 0.1 in your first posting. >> Although I guess the mention of 0.95 is just confusing significance >> level and confidence level, this illustrates a major difficulty with >> this approach: the threshold is arbitrary. You then have to argue with >> both those who want a different threshold and those who don't believe >> you should use just significance tests for your decision-making here. >> >> 4. A reviewer of your work is likely to have some favourite >> variable(s) that they think should be tried out. If your story is >> going to be "Oh yes, I tried that but it wasn't significant, so it's >> not in the Table" that is not going to impress. Most reviewers want >> access to all the results in principle; how much time they spend >> scanning them is their capricious decision. >> >> Note that #4 can bite you even if you discount #1, #2, #3. >> >> Nick >> >> On Fri, Oct 14, 2011 at 4:46 AM, hind lazrak <hindstata@gmail.com> wrote: >> >>> Thank you for taking the time to respond to the question I posted. >>> >>> I made the example simpler in my post for more clarity. >>> >>> In the first step I ran the pwcorr, sig to capture the list of >>> variables that I ran in the loop. >>> In fact the simple linear regression does include three other >>> variables that may act as either modifier or confounder. So I need to >>> control for them. >>> >>> So this brings me back to the original question. Is there any way to >>> get a table of coeffs that are statistically significant at the 95% >>> level? >> >> Richard Williams >>>> At 04:54 PM 10/13/2011, hind lazrak wrote: >> >>>>> I am using Stata Version 10 on Windows Vista. >>>>> The analysis I am conducting is exploratory and involves a long list >>>>> of independent variables I am testing using simple linear regression. >>>>> In order to see which variables are "promising" I'd like to find a way >>>>> to store each model estimate and ideally figure out how to tabulate >>>>> only those that have a p-value<0.1. >>>>> >>>>> The code I used is as follow >>>>> >>>>> foreach var of varlist [list of 55 vars] { >>>>> qui reg y1 `var' */ first set of regressions looking at Y1 >>>>> eststo model1`var' >>>>> >>>>> qui reg y2 `var' */ second set of regressions looking at Y2 >>>>> eststo model2`var' >>>>> } >>>>> estimates table model1`var' model`var', beta not >>>>> >>>>> This code is not working because it overwrites all the estimates in >>>>> each regression and only keeps the last one. Also I did not figure out >>>>> how to only show those with p-val<0.1 >>>> >>>> The line >>>> >>>> estimates table model1`var' model`var', beta not >>>> >>>> should probably be >>>> >>>> estimates table model1`var' model2`var', beta not >>>> >>>> And, it should come before the }, not afterwards. >>>> >>>> This is just a bunch of bivariate regressions, right? Why not something like >>>> >>>> pwcorr y1 y2 x1-x55, star(.10) >>>> >>>> You could probably also fiddle around with the ereturned results and make >>>> the estimates table command conditional on one p value or the other being >>>> significant. >> * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/