[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: model building survey data. WAS st: adjusted R2 in survey regression |

Date |
Sat, 18 Oct 2008 11:25:48 -0400 |

On Oct 18, 2008, at 4:47 AM, Aca N.T. wrote:

Point 2 (-test-) suggests that modelling should be performed forwardly. What if we want to do it backwardly? I start fitting a regression model with all regressors.

Additional questions: 1. Is there a way to calculate VIF for each variable or each group of a variable in survey regression? Using -display "tolerance = " 1-e(r2) " VIF = " 1/(1-e(r2))- appears to show mean VIF only.

See: http://www.ats.ucla.edu/stat/stata/faq/svycollin.htm

2. Does a model with largest adjusted R2 but missing F-test as well as VIF remain a best fit? I forced the main but non-significant regressor of interest to stay in the model.

3. Is it time to ignore weight when checking diagnostics for thefinal model?

Thanks in advance, Aca On Thu, Oct 16, 2008 at 9:30 PM, Steven Samuels <sjhsamuels@earthlink.net> wrote:Aca:1. The -linktest- command is an excellent test of fit if somepredictors arecontinuous, and it can assist in model building. See:http://www.ats.ucla.edu/stat/Stata/webbooks/logistic/chapter3/statalog3.htmand http://www.michiganscienceonline.org/article.aspx? ID=8669. Ifthe linktest is significant, something must be changed: add new predictors,including polynomial terms and interactions; transform predictors;transformoutcome. You will have to figure out the solution yourself. Notethat amodel that passes the link test is not necessarily a "good" modelor onethat predicts well. Conversely, a model that predicts well mayalso displaya lack of fit. You may encounter a situation where adding astatisticallysignificant variable turned a non-significant link test (noevidence of lackof fit) into a significant one (model does not fit). I have alsoseen (once)a situation in which no model we could think up passed the link test.Unfortunately, -linktest- is not survey-aware, and will give anincorrectp-value if run after -svy: reg-. Here is a way of doing ityourself (besure to zap text gremlins first). **************************CODE BEGINS************************** sysuse auto,clear gen psu= mod(_n, 10) // artificial cluster svyset psu [pweight=rep78] reg mpg weight predict yhat gen yhat2= yhat*yhat svy: reg mpg yhat yhat2 //significance of yhat2 is the link test ***************************CODE ENDS***************************2. However, the link test does not compare models of differentsets ofcovariates. For that you will need -test- (-help test-) **************************CODE BEGINS************************** svy: reg mpg weight trunk lengthtest trunk length //tests for significance of adding trunk andlength***************************CODE ENDS***************************3. Aids to model building: There are several commands which willsuggesttransformations of predictors: Stata's command -fracpoly- andcommands-mfracpol- and -boxtid- by Patrick Royston (search mfracpol, all/searchboxtid, all). They are not -svy- aware, but do accept pweights andclustering options Do a google searches on "fractionalpolynomials" and"multivariate fractional polynomials" to learn more about them. **************************CODE BEGINS************************** fracpoly reg mpg weight [pweight=weight], vce(cluster psu) ***************************CODE ENDS***************************4. If you try to build models by finding significant covariates,the "final"model is unlikely to hold up in new data. You can avoid this by usingtheory-based models, as Maarten suggested. Otherwise, regard yourmodels asexploratory. At a minimum, set aside part of your data (say someof thestrata), build the model on the rest, and test the model on theset-asidepart. On Oct 15, 2008, at 8:51 PM, Aca N.T. wrote:Steve had shown how -dlist- can sort my problem out anyway. In this case, however, I was wondering if -linktest- can be used as asubtitute for adjusted R2 (or should be more as complementarytest?).I mean, does -linktest- act like -lrtest- which is to compare LRfromone model to another when running a simple logistic regression so we can see how a model is improved? Aca. On Thu, Oct 16, 2008 at 5:01 AM, Stas Kolenikov <skolenik@gmail.com> wrote:On Wed, Oct 15, 2008 at 2:23 AM, Aca N.T. <acant29@gmail.com>wrote:I'm puzzled with model building using -svy: reg- for there is no adjusted R squared produced. Is there an alternative test for this?Uhm... alternative test for what? If Stata does not produce something really obvious, like R2 or adjusted R2, then it means they looked into this and decided it had dubious statistical properties. R2 is an iid data concept: each residual is a random variable that has a certain variance, and thatvariance is the same for all observations. The complex surveysettingdoes not really have that concept: the explanatory and response variables are in fact fixed, and the randomness comes from samplingprocedure only. The regression formulas may look the same (inthe end,there are just this many ways to minimize a sum of squares...) butinterpretation of a few things is different. So one can probablytalkabout population variance of residuals, as a relatively meaningfulquantity, but there is no analogue of the concept of thevariance ofeach individual residual -- that's a fixed quantity. If there is nopopulation analogue of R2, it should not be reported to theuser, andthat makes perfect sense. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: adjusted R2 in survey regression***From:*"Aca N.T." <acant29@gmail.com>

**Re: st: adjusted R2 in survey regression***From:*"Stas Kolenikov" <skolenik@gmail.com>

**Re: st: adjusted R2 in survey regression***From:*"Aca N.T." <acant29@gmail.com>

**Re: model building survey data. WAS st: adjusted R2 in survey regression***From:*Steven Samuels <sjhsamuels@earthlink.net>

**Re: model building survey data. WAS st: adjusted R2 in survey regression***From:*"Aca N.T." <acant29@gmail.com>

- Prev by Date:
**RE: st: Re: Brant test** - Next by Date:
**st: Building a matrix of distances with sphdist** - Previous by thread:
**Re: model building survey data. WAS st: adjusted R2 in survey regression** - Next by thread:
**Re: st: adjusted R2 in survey regression** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |