Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Appropriate modelling - testing which set of exposures are more important

From   Maarten Buis <>
Subject   Re: st: Appropriate modelling - testing which set of exposures are more important
Date   Fri, 28 Sep 2012 10:19:24 +0200

On Thu, Sep 27, 2012 at 7:46 PM, Amal Khanolkar wrote:
> I have two main exposures; maternal ethnicity and maternal socioeconomic position (SEP).
> I want to test which of the above two exposures are more important in determining maternal pregnancy outcomes.
> 1. I plan to use linear regression, as my outcome of interest is continuous.

A continuous dependent/outcome/left-hand-side/explained variable is
neither a necessary nor a sufficient reason for choosing a linear
regression model.

> 2. Initially, the first model will test the effect of ethnicity on the outcome, controlling for potential confounders as follows:
> xi: regress outcome i.ethnicity confounder1 confounder2 i.confounder3
> 3. In the next step, I introduce the second main exposure, maternal SEP:
> xi: regress outcome i.ethnicity confounder1 confounder2 i.confounder3 i.SEP
> 4. I test for an interaction as follows:
> xi: regress outcome i.ethnicity*i.SEP confounder1 confounder2 i.confounder3
> Questions: If the effects of ethnicity on my outcome of interest change from step2 to step3, controlling for the same confounders in both models, is this enough evidence of one exposure being more important than the other?

No, think of it this way: If you started with maternal SEP instead and
than added ethnicity than that would also lead to a change in the
effect of maternal SEP. So if your argument were correct you could
choose which variable is more important by choosing with which
variable you started your analysis...

As a solution you would need to be more precise about what "more
important" means. To me it refers to some form of comparison of effect
sizes. Than the trick becomes to create effect sizes for categorical
variables that are comparable. One approach is the sheaf coefficient:
see: -ssc desc sheafcoef-, <>
and the example below. In the example below I would say that
occupation and education are about equally important for wage and both
are more important than marital status. If you like testing, than you
can test these hypotheses, as is shown below.

*----------------------- begin example ----------------------
// prepare data
sysuse nlsw88, clear
recode occupation (11/12=4) ///

gen byte marst = never_married + 2*married
label define marst 0 "divorced/widowed" ///
                   1 "never married"    ///
                   2 "married"
label value marst marst

gen byte ed = cond(grade  < 12, 1, ///
              cond(grade == 12, 2, ///
              cond(grade  <  ., 3, .)))
label define ed 1 "less than high school" ///
                2 "high school" ///
                3 "more than high school"			
label ed ed

// estimate the model
// notice wage is (approximately) continuous,
// but linear regression is not the best choice				
xi: glm wage i.occupation i.marst i.ed union ttl_exp, ///
    link(log) vce(robust)

// make the effects of occupation, marital status
// and education comparable
sheafcoef, latent(occ:_Iocc* ; marst:_Imarst* ; ed:_Ied*) ///
   eform post

// test whether the comparable effects are different
test occ_e = ed_e
test occ_e = marst_e
test ed_e  = marst_e
*------------------------ end example -----------------------
(For more on examples I sent to the Statalist see: )

-- Maarten

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index