r-statistics.co by Selva Prabhakaran


Robust Regression

Robust regression can be used in any situation where OLS regression can be applied. It generally gives better accuracies over OLS because it uses a weighting mechanism to weigh down the influential observations. It is particularly resourceful when there are no compelling reasons to exclude outliers in your data.

Robust regression can be implemented using the rlm() function in MASS package. The outliers can be weighted down differently based on psi.huber, psi.hampel and psi.bisquare methods specified by the psi argument.

How To Specify A Robust Regression Model

library(MASS)
rlm_mod <- rlm(stack.loss ~ ., stackloss, psi = psi.bisquare)  # robust reg model
summary(rlm_mod)
#> Call: rlm(formula = stack.loss ~ ., data = stackloss)
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -8.91753 -1.73127  0.06187  1.54306  6.50163 
#> 
#> Coefficients:
#>             Value    Std. Error t value 
#> (Intercept) -41.0265   9.8073    -4.1832
#> Air.Flow      0.8294   0.1112     7.4597
#> Water.Temp    0.9261   0.3034     3.0524
#> Acid.Conc.   -0.1278   0.1289    -0.9922
#> 
#> Residual standard error: 2.441 on 17 degrees of freedom

Compare Performance of rlm() with lm()

Lets build the equivalent lm() model so we can compare the errors against the respective fitted values.

lm_mod <- lm(stack.loss ~ ., stackloss)  # lm reg model

Calculate the Errors

# Errors from lm() model
DMwR::regr.eval(stackloss$stack.loss, lm_mod$fitted.values)
#>       mae       mse      rmse      mape 
#> 2.3666202 8.5157125 2.9181694 0.1458878

# Errors from rlm() model
DMwR::regr.eval(stackloss$stack.loss, rlm_mod$fitted.values)
#>       mae       mse      rmse      mape 
#> 2.1952232 9.0735283 3.0122298 0.1317191

As expected, the errors from the robust regression model is lesser than the linear regression model.