Stata | FAQ: Stepwise regression with the svy commands

Home / Resources & support / FAQs / Stepwise regression with svy commands

Is there a way in Stata to do stepwise regression with svy: logit or any of the svy commands?

Title		Stepwise regression with the svy commands
Author		William Sribney, StataCorp

The stepwise prefix command in Stata does not work with svy: logit or any other svy commands. Most search-lots-of-possibilities stepwise procedures are not sound statistically, and most statisticians would not recommend them.

For a list of problems with stepwise procedures, see the FAQ: What are some of the problems with stepwise regression?

To these reasons, let me add that using stepwise methods for cluster-sampled data is even more problematic because the effective degrees of freedom is bounded by the number of clusters. Thus we have no plans to allow the svy commands to work with the stepwise procedure.

If you do not have a priori hypotheses to test, then model building is really an art. I recommend that you do what I call “planned backward block stepwise regression”. Other people call this “hierarchical stepwise regression”.

That is,

Arrange your covariates into logical groupings. I will call the groupings {a1, a2, ...}, {b1, b2, ...}, {c1, c2, ...}, .... Order the groupings so that the ones that you think a priori are least important are last.

Run your full model. E.g.,

 . svy: logit y a1 a2 ... b1 b2 ... c1 c2 ... h1 h2 ...

Test the last group (the least important):
. test h1 h2 ...
If it is not significant, discard the whole group. If it is significant, keep the whole group.
Then test the second-to-last group, etc.
When you have tested all the groups and have kept only the significant ones, do the same procedure with each covariate. This last step should be considered optional for two reasons. First, it may not make sense to split up the covariates in the group (e.g., they may be dummies for a categorical variable). Second, performing yet more tests is not a good thing. But people usually cannot stand leaving nonsignificant terms in their “final” model. However, overfitting is better than overtesting!

Steps 1–4 alone are problematic because of multiple comparisons. One should really do a Bonferroni correction for testing the groups.

That is, if you have K groups of covariates to test, you should use a significance level of 0.05/K. This is a stringent procedure but is the only statistically sound thing to do, in my opinion. Remember that ideally one should be testing M a priori hypotheses each at a level of 0.05/M, so you should not be rewarded for not having a priori hypotheses! But, for the sake of having something to publish, a Bonferroni correction is usually not done.

If you did not have survey data, I would recommend doing the above procedure coupled with a split sample approach: you divide your sample into two parts, develop a model on one part, and then try to confirm it on the other. (When it does not get confirmed, you will be stuck, so you will make sure that you have a priori hypotheses for the next study you are involved with.)

Splitting up survey data, however, is a dicey proposition if you have only a moderate number of clusters (PSUs) because you should keep clusters whole. Thus you should only split survey data if you have many clusters in each stratum.

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Is there a way in Stata to do stepwise regression with svy: logit or any of the svy commands?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Is there a way in Stata to do stepwise regression with svy: logit or any of the svy commands?

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies