[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: model building survey data. WAS st: adjusted R2 in survey regression

From	"Aca N.T." <[email protected]>
To	[email protected]
Subject	Re: model building survey data. WAS st: adjusted R2 in survey regression
Date	Sat, 18 Oct 2008 15:47:31 +0700

Point 2 (-test-) suggests that modelling should be performed
forwardly. What if we want to do it backwardly? I start fitting a
regression model with all regressors.

Additional questions:
1. Is there a way to calculate VIF for each variable or each group of
a variable in survey regression? Using -display "tolerance = " 1-e(r2)
" VIF = " 1/(1-e(r2))- appears to show mean VIF only.

2. Does a model with largest adjusted R2 but missing F-test as well as
VIF remain a best fit? I forced the main but non-significant regressor
of interest to stay in the model.

3. Is it time to ignore weight when checking diagnostics for the final model?

Thanks in advance,

Aca

On Thu, Oct 16, 2008 at 9:30 PM, Steven Samuels
<[email protected]> wrote:
> Aca:
>
> 1. The -linktest- command is an excellent test of fit if some predictors are
> continuous, and it can assist in model building. See:
> http://www.ats.ucla.edu/stat/Stata/webbooks/logistic/chapter3/statalog3.htm
> and http://www.michiganscienceonline.org/article.aspx? ID=8669. If the link
> test is significant, something must be changed: add new predictors,
> including polynomial terms and interactions; transform predictors; transform
> outcome. You will have to figure out the solution yourself.  Note that a
> model that passes the link test is not necessarily a "good" model or one
> that predicts well. Conversely, a model that predicts well may also display
> a lack of fit. You may encounter a situation where adding a statistically
> significant variable turned a non-significant link test (no evidence of lack
> of fit) into a significant one (model does not fit). I have also seen (once)
> a situation in which no model we could think up passed the link test.
>
> Unfortunately,  -linktest- is not survey-aware, and will give an incorrect
> p-value if run after -svy: reg-.  Here is a way of doing it yourself (be
> sure to zap text gremlins first).
>
> **************************CODE BEGINS**************************
> sysuse auto,clear
> gen psu= mod(_n, 10)  // artificial cluster
> svyset psu [pweight=rep78]
>
> reg mpg weight
> predict yhat
> gen yhat2= yhat*yhat
> svy: reg mpg yhat yhat2   //significance of yhat2 is the link test
> ***************************CODE ENDS***************************
>
>
> 2. However, the link test does not compare models of different sets of
> covariates.  For that you will need  -test- (-help test-)
>
> **************************CODE BEGINS**************************
> svy: reg mpg weight trunk length
> test trunk length  //tests for significance of adding trunk and length
> ***************************CODE ENDS***************************
>
>
> 3. Aids to model building: There are several commands which will suggest
> transformations of predictors: Stata's command -fracpoly- and commands
> -mfracpol- and -boxtid- by Patrick Royston (search mfracpol, all/ search
> boxtid, all).  They are not -svy- aware, but do accept pweights and
> clustering options  Do a google searches on "fractional polynomials" and
> "multivariate fractional polynomials" to learn more about them.
>
>
> **************************CODE BEGINS**************************
> fracpoly reg mpg weight [pweight=weight], vce(cluster psu)
> ***************************CODE ENDS***************************
>
> 4. If you try to build models by finding significant covariates, the "final"
> model is unlikely to hold up in new data. You can avoid this by using
> theory-based models, as Maarten suggested. Otherwise, regard your models as
> exploratory.  At a minimum, set aside part of your data (say some of the
> strata), build the model on the rest, and test the model on the set-aside
> part.
>
>
> On Oct 15, 2008, at 8:51 PM, Aca N.T. wrote:
>
>> Steve had shown how -dlist- can sort my problem out anyway. In this
>> case, however, I was wondering if -linktest- can be used as a
>> subtitute for adjusted R2 (or should be more as complementary test?).
>> I mean, does -linktest- act like -lrtest- which is to compare LR from
>> one model to another when running a simple logistic regression so we
>> can see how a model is improved?
>>
>> Aca.
>>
>>
>> On Thu, Oct 16, 2008 at 5:01 AM, Stas Kolenikov <[email protected]>
>> wrote:
>>>
>>> On Wed, Oct 15, 2008 at 2:23 AM, Aca N.T. <[email protected]> wrote:
>>>>
>>>> I'm puzzled with model building using -svy: reg- for there is no
>>>> adjusted R squared produced.
>>>> Is there an alternative test for this?
>>>
>>> Uhm... alternative test for what?
>>>
>>> If Stata does not produce something really obvious, like R2 or
>>> adjusted R2, then it means they looked into this and decided it had
>>> dubious statistical properties. R2 is an iid data concept: each
>>> residual is a random variable that has a certain variance, and that
>>> variance is the same for all observations. The complex survey setting
>>> does not really have that concept: the explanatory and response
>>> variables are in fact fixed, and the randomness comes from sampling
>>> procedure only. The regression formulas may look the same (in the end,
>>> there are just this many ways to minimize a sum of squares...) but
>>> interpretation of a few things is different. So one can probably talk
>>> about population variance of residuals, as a relatively meaningful
>>> quantity, but there is no analogue of the concept of the variance of
>>> each individual residual -- that's a fixed quantity. If there is no
>>> population analogue of R2, it should not be reported to the user, and
>>> that makes perfect sense.
>>>
>>> --
>>> Stas Kolenikov, also found at http://stas.kolenikov.name
>>> Small print: I use this email account for mailing lists only.
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/statalist/faq
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: model building survey data. WAS st: adjusted R2 in survey regression
  - From: Steven Samuels <[email protected]>

References:
- st: adjusted R2 in survey regression
  - From: "Aca N.T." <[email protected]>
- Re: st: adjusted R2 in survey regression
  - From: "Stas Kolenikov" <[email protected]>
- Re: st: adjusted R2 in survey regression
  - From: "Aca N.T." <[email protected]>
- Re: model building survey data. WAS st: adjusted R2 in survey regression
  - From: Steven Samuels <[email protected]>

Prev by Date: Re: st: Re: changing format
Next by Date: RE: st: Missing standard errors with xtmixed
Previous by thread: Re: model building survey data. WAS st: adjusted R2 in survey regression
Next by thread: Re: model building survey data. WAS st: adjusted R2 in survey regression
Index(es):
- Date
- Thread