Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: compare effect size between dummys and metrics variables in logistic regression


From   Maarten buis <maartenbuis@yahoo.co.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: compare effect size between dummys and metrics variables in logistic regression
Date   Mon, 27 Sep 2010 09:20:06 +0000 (GMT)

--- On Sun, 26/9/10, Joerg Eulenberger wrote:
> I want to calculate an binary logistic regression. I have
> all metric variables z-transformed (mean = 0, std=1) to
> compare the effect size between the independent variable.
> But I have also dummys in my Regressionmodell. What can i
> do to compare the effect size of the dummy's with the
> effect size of the metric independent variables? Or is
> that completely impossible?

That is more a conceptual problem, even for metric variables.
There are many answers possible, none of them will work for
all situations, and for many (some will say all) situations
there is simply no answer.

1), my default is not to compare the effects of variables
unless I have a specific interest in it. 

2), my default is not to standardize variables that have 
a natural unit. It is just much more informative to say that
the average income will increase with x euros for every year
extra education than to say that the average income will 
increase y standard devations for every standard deviation 
increase in education. The only exception would be point 1).

3) If I have a situation where a comparison of coefficients
is of substantive interest, it is almost always limited to
only a few variables. In that case I would tailor my 
standardization to those variables alone, and leave the 
remaining variables untouched.

The aim would be to make the unit of our variables comparable.
This can be achieved in many ways, none of these will work in
all situations. So you would need to look at every pair and 
conceptually think about what makes substantive sense. Some
of examples of such standardization are:

3a) compute z-scores. You basically assume that a standard
deviation change in one variable is comparable with 1 
standard deviation change in another variable.

3b) standardize on range. You basically assume that a movement
from the minimum to the maximum in one variable is comparable
with the same move in another variable. This often does not
work well when one variable has a restricted number of categories
while the other has a much larger number of categories. A dummy
variable is an example of a variable with an extremely limited
number of categories.

3c) Standardize on percentile rank scores. You are looking at 
the proportion of respondents that has a value less than your
own value. This sometimes also makes substantive sense: you 
can have a theoretical reason to believe that people do no 
react on the absolute value of a certain variable but on how 
well they do compared to the rest of the population.

All of these standardization can in principle be computed for
dummy variables, but I would be least uncomfortable with 
percentile rank scores. 

*-------------------- begin example -------------------------
sysuse nlsw88, clear
gen black = race == 2 if race <= 2
gen byte baseline = 1

local rhs "black grade"
gen byte touse = !missing(union,black,grade) 
tempvar n i
gen long `n' = .
gen long `i' = .
foreach var of varlist `rhs' {
    // standardize by standard deviation
    sum `var' if touse
    gen z_`var' = (`var' - r(mean))/r(sd)
    local z_rhs "`z_rhs' z_`var'"
	
    // standardize by range
    gen r_`var' = (`var' - r(min))/(r(max)-r(min))
    local r_rhs "`r_rhs' r_`var'"
	
    // standardize by percentile rank score
    drop `n' `i'
    egen long `n' = count(`var')
    egen long `i' = rank(`var')
    gen h_`var' = (`i' - 0.5) / `n' 	
    local h_rhs "`h_rhs' h_`var'"
}

list `rhs' `z_rhs' `r_rhs' `h_rhs' in 1/10

qui logit union `rhs' baseline, nocons
est store non

qui logit union `z_rhs' baseline, nocons
est store z

qui logit union `r_rhs' baseline, nocons
est store range

qui logit union `h_rhs' baseline, nocons
est store hazen

est tab  non z range hazen, eform b(%9.3f)
*---------------- end example -----------------------
(For more on examples I sent to the Statalist see: 
http://www.maartenbuis.nl/example_faq )

4) A special case occurs when a set of your dummie variables
represent a categorical variable: e.g. race or religion. In
those case you might look at -sheafcoef-, see 
-ssc desc sheafcoef- and 
<http://www.maartenbuis.nl/software/sheafcoef.html>

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------




      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index