Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: If Stata drops regressors due to Colinearity: Has it automatically picked the right regressors then?


From   Ronan Conroy <rconroy@rcsi.ie>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: If Stata drops regressors due to Colinearity: Has it automatically picked the right regressors then?
Date   Thu, 9 Dec 2010 11:22:39 +0000

On 8 Dec 2010, at 16:29, Jen Zhen wrote:

> Dear list members,
> 
> I'm running a Cox regression on a treatment variable along with 4
> splines in age (created with -mkspline-), one for each of 4 periods in
> which slightly different rules applied.
> 
> Like this, Stata drops most of the regressors due to Colinearity, even
> though for many of the spline regressors there are observations for
> which this spline is nonzero.

I would start by building up the model rather than trying to prune it. Each added variable will inevitably decrease precision of estimates. And, of course, you are the person who knows why variables are collinear and which variable ought to be added to the model.

In general, finding variables dropped due to collinearity suggests that more thought is needed about the model specification. I am wondering why you have such a complex age model to begin with – not that I need an explanation, but maybe you need to think about the question again. 

But certainly I would avoid letting Stata specify your model. There is a saying that used to be known as the IBM Polyanna Principle, which stated that "Machines should work, people should think". 

> 
> Am I correct to infer that my sample is simply not large enough to use
> all these regressors and that I will therefore need to use for
> instance only one spine for all 4 periods together?
> If so, is there any way to make sure that my control function will
> still take out any variation in my outcome duration that is correlated
> with the treatment, but is not actually due to it?
> 

Ronán Conroy
rconroy@rcsi.ie
Associate Professor
Division of Population Health Sciences
Royal College of Surgeons in Ireland


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index