Centering Variables to Reduce Multicollinearity

RSS Author RSS     Views:N/A
Bookmark and Share          Republish
Centering is one of those topics in statistics that everyone seems to have heard of, but most people don't know much about. It has developed a mystique that is entirely unnecessary.

Centering just means subtracting a single value from all of your data points. It shifts the scale of a variable and is usually applied to predictors. It's called centering because people often use the mean as the value they subtract (so the new mean is now at 0), but it doesn't have to be the mean. In fact, there are many situations when a value other than the mean is most meaningful.

While centering can be done in a simple linear regression, its real benefits emerge when there are multiplicative terms in the model—interaction terms or quadratic terms (X-squared).

There are two reasons to center. The first is when an interaction term is made from multiplying two predictor variables are on a positive scale. When you multiply them to create the interaction, the numbers near 0 stay near 0 and the high numbers get really high. The interaction term then is highly correlated with original variables.


But this is easy to check. Simply create the multiplicative term in your data set, then run a correlation between that interaction term and the original predictor. While correlations are not the best way to test multicollinearity, it will give you a quick check.

Then try it again, but first center one of your IVs.

Centering one of your variables at the mean (or some other meaningful value close to the middle of the distribution) will make half your values negative (since the mean now equals 0). When those are multiplied with the other positive variable, they don't all go up together.

The other reason is to help interpretation of parameter estimates (regression coefficients, or betas).

Report this article

Bookmark and Share
Republish



Ask a Question about this Article