Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: mean centering

From   "JVerkuilen (Gmail)" <>
Subject   Re: st: mean centering
Date   Sun, 20 Jan 2013 10:22:34 -0500

On Sun, Jan 20, 2013 at 8:07 AM, James Bernard <> wrote:
> Hi,
> I know that mean centering is not really a remedy for the resultant
> multidisciplinary among interaction effect variables.

I think autocorrect busted you, because I'm sure you meant
multicollinearity. ;)

But, it has
> become a common practice.

Yes, and it's not bad practice most of the time. I tend to recommend
to students in class that it's not a bad idea to center all continuous
predictors, or at least put the 0 at a meaningful place so the
intercept is interpretable. An example of a non-sample standardization
might be to center observed SAT scores centered at 500 even if the
sample mean isn't 500, because 500 is the population average.

I have two questions:
> 1- In the case of panel data: should we deduct the mean of all the
> observations of that particular variables or should we do this by
> groups of observation?

I don't really think there's a clear "must do". In the linear model
statistics like the R^2 and t- and F-statistics are invariant to
affine transformation of the X variables. Depending on how you set up
the model you'll get different estimates. However, you can get quite
different conclusions if you estimate different things. For instance,
the coefficients from an effects coded binary variable interaction
(which is centered) and a dummy coded binary variable interaction
(which is not) are not the same thing, even if the relevant tests end
up being the same.

(A reasonably easy check of whether two linear models are the same is
whether H = X (X' X)^-1 X' is the same for both models.)

> 2- is there any direct command in Stata for mean centering?

egen has the option std which generates z-scores (and with optional
arguments other variables with specified mean and standard deviation).

Otherwise you need to compute the mean first using summarize and then
generate the relevant variable:

sysuse auto
summarize price, meanonly
generate cprice = price - r(mean)
egen zprice = std(price)
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index