How important is that variable?

When modeling any phenomena by including explanatory variables that highly relates the variable of interest, one question arises: which of the auxiliary variables have a higher influence on the response? I am not writing about significance testing or something like this. I am just thinking like a researcher who wants to know the ranking of variables that influence the response and their related weight.

There are a variety of methods that try to answer that question. The one inducing this thread is very simple: isolate units from variables. Assume a linear model with the following structure (for the sake of simplicity, assume only two explanatory variables):

$$y = \beta_1 x_1 + \beta_2 x_2 + \varepsilon$$

If you assume this model as true and $\beta_i > 0$, then the influence of variable $x_i$, over response $y$ could be found when isolating measure units from variables. Then, one could fit a model over the standardized variables (explanatory and response) and then directly comparing the regression coefficients. Another way to do this is by means of the following expression:

$$I(i) = \frac{\beta_i}{sd(\beta_i)} = \beta_i\frac{ sd(x_i)}{sd(y)}$$

For example, let’s consider the following model $y = -500 x_1 + 50 x_2 + \varepsilon$, then the relative importance of the first and second variable is around 500/(500+50) = 0.9, and 50/(500+50) = 0.1, respectively. The following code shows how to perform this simple analysis in R.

n <- 10000

x1 <- runif(n)
x2 <- runif(n)
y <- -500 * x1 + 50 * x2 + rnorm(n)

model <- lm(y ~ 0 + x1 + x2)

# 1a. Standardized betas
summary(model)$coe[,2]
sd.betas <- summary(model)$coe[,2]
betas <- model$coefficients
imp <- abs(betas)/sd.betas
imp <- imp/sum(imp)
imp

# 1b. Standardized betas
imp1 <- abs(model$coefficients[1] * sd(x1)/sd(y))
imp2 <- abs(model$coefficients[2] * sd(x2)/sd(y))

imp1 / (imp1 + imp2)
imp2 / (imp1 + imp2)

# 2. Standardized variables
model2 <- lm(I(scale(y)) ~ 0 + I(scale(x1)) + I(scale(x2)))
summary(model2)
abs(model2$coefficients)/sum(abs(model2$coefficients))

6 comentarios en “How important is that variable?”

William dijo:

7 diciembre, 2016 a las 9:15 pm

The `relaimpo` package does this standardized coefficients method and more. For example, in the LMG method, the in-sample R2 is allocated to each variable.set.seed(123)n <- 10000×1 <- runif(n)x2 <- runif(n)y <- -500 * x1 + 50 * x2 + rnorm(n)model <- lm(y ~ x1 + x2) ### include intercept### Use relaimpolibrary(relaimpo)calc.relimp(model, type=c(\»lmg\», \»betasq\»))

Me gustaMe gusta

Andrés Gutiérrez dijo:

5 diciembre, 2016 a las 7:54 pm

Daniel, the command I() let's you make some transformations inside your model, even when that transformation is not defined into your database. For example, I can do this:lm(I(scale(y)) ~ 0 + I(scale(x1)) + I(scale(x2)))Even when standardized variables are not defined into the R environment.

Me gustaMe gusta

Daniel Addyson dijo:

5 diciembre, 2016 a las 4:51 pm

hi, this is really stupid question — but what does the I do in the lm() call?

Me gustaMe gusta

Alex Lev dijo:

5 diciembre, 2016 a las 5:54 am

Why not if we speak about Anova type III?

Me gustaMe gusta

Farrel Buchinsky dijo:

5 diciembre, 2016 a las 1:01 am

@Alex LevIs this where you simply compare the F Values or the residuals?

Me gustaMe gusta

Alex Lev dijo:

4 diciembre, 2016 a las 9:29 am

Hi, Andrez! Nice post. I wonder why don't you apply anova in this case? Or maybe car::Anova? Anyway your method is very interesting. Thanks!

Me gustaMe gusta

Deja un comentario

William dijo:
7 diciembre, 2016 a las 9:15 pm

The `relaimpo` package does this standardized coefficients method and more. For example, in the LMG method, the in-sample R2 is allocated to each variable.set.seed(123)n <- 10000×1 <- runif(n)x2 <- runif(n)y <- -500 * x1 + 50 * x2 + rnorm(n)model <- lm(y ~ x1 + x2) ### include intercept### Use relaimpolibrary(relaimpo)calc.relimp(model, type=c(\»lmg\», \»betasq\»))

Me gustaMe gusta
Andrés Gutiérrez dijo:
5 diciembre, 2016 a las 7:54 pm

Daniel, the command I() let's you make some transformations inside your model, even when that transformation is not defined into your database. For example, I can do this:lm(I(scale(y)) ~ 0 + I(scale(x1)) + I(scale(x2)))Even when standardized variables are not defined into the R environment.

Me gustaMe gusta
Daniel Addyson dijo:
5 diciembre, 2016 a las 4:51 pm

hi, this is really stupid question — but what does the I do in the lm() call?

Me gustaMe gusta
Alex Lev dijo:
5 diciembre, 2016 a las 5:54 am

Why not if we speak about Anova type III?

Me gustaMe gusta
Farrel Buchinsky dijo:
5 diciembre, 2016 a las 1:01 am

@Alex LevIs this where you simply compare the F Values or the residuals?

Me gustaMe gusta
Alex Lev dijo:
4 diciembre, 2016 a las 9:29 am

Hi, Andrez! Nice post. I wonder why don't you apply anova in this case? Or maybe car::Anova? Anyway your method is very interesting. Thanks!

Me gustaMe gusta

Este sitio utiliza Akismet para reducir el spam. Conoce cómo se procesan los datos de tus comentarios.

Comparte esto:

6 comentarios en “How important is that variable?”

Deja un comentario