Loess Regression is the most common method used to smoothen a volatile time series. It is a non-parametric methods where least squares regression is performed in localized subsets, which makes it a suitable candidate for smoothing any numerical vector.

## Introduction

*Loess* short for Local Regression is a non-parametric approach that fits multiple regressions in local neighborhood. This can be particularly resourceful, if you know that your *X* variables are bound within a range.

Loess regression can be applied using the `loess()`

on a numerical vector to smoothen it and to predict the Y locally (i.e, within the trained values of *X**s*). The size of the neighborhood can be controlled using the `span`

argument, which ranges between 0 to 1. It controls the degree of smoothing. So, the greater the value of `span`

, more smooth is the fitted curve.

The predictor variable can just be indices from 1 to number of observations in the absence of explanatory variables. If other explanatory variables are available, they can be used as well (maximum of 4).

## Example

For this example we will try to locally regress and smooth the *median duration of unemployment* based on the `economics`

dataset from `ggplot2`

package. We consider only the first 80 rows for this analysis, so it is easier to observe the degree of smoothing in the graphs below.

```
data(economics, package="ggplot2") # load data
economics$index <- 1:nrow(economics) # create index variable
economics <- economics[1:80, ] # retail 80rows for better graphical understanding
loessMod10 <- loess(uempmed ~ index, data=economics, span=0.10) # 10% smoothing span
loessMod25 <- loess(uempmed ~ index, data=economics, span=0.25) # 25% smoothing span
loessMod50 <- loess(uempmed ~ index, data=economics, span=0.50) # 50% smoothing span
```

## Predict Loess

```
# get smoothed output
smoothed10 <- predict(loessMod10)
smoothed25 <- predict(loessMod25)
smoothed50 <- predict(loessMod50)
```

From above plot, you would notice that as the span increases, the smoothing of the curve also increases.

##### Code for Plot

```
# Plot it
plot(economics$uempmed, x=economics$date, type="l", main="Loess Smoothing and Prediction", xlab="Date", ylab="Unemployment (Median)")
lines(smoothed10, x=economics$date, col="red")
lines(smoothed25, x=economics$date, col="green")
lines(smoothed50, x=economics$date, col="blue")
```

## Finding the optimal smoothing span

As the smoothing `span`

changes, the accuracy of the fitted curve also changes. If your intent is to minimize the error, the `optim()`

can be used to find that value of `span`

, that minimizes the *Sum of Squared Errors (SSE)*. For this case, it is graphically intuitive that lower SSE will likely be achieved at lower values of `span`

, but for more challenging cases, `optimizing`

span could help.

To implement `optim()`

, we define the function that computes the *SSE*. An error handling mechanism is needed to address very low values of `span`

and cases where the non-numerics are produced. The simulated annealing method (`SANN`

) is implemented here to find the `span`

that gives minimal *SSE*. The `par`

argument specifies the first value of the `span`

at which `optim()`

will begin the search.

```
# define function that returns the SSE
calcSSE <- function(x){
loessMod <- try(loess(uempmed ~ index, data=economics, span=x), silent=T)
res <- try(loessMod$residuals, silent=T)
if(class(res)!="try-error"){
if((sum(res, na.rm=T) > 0)){
sse <- sum(res^2)
}
}else{
sse <- 99999
}
return(sse)
}
# Run optim to find span that gives min SSE, starting at 0.5
optim(par=c(0.5), calcSSE, method="SANN")
#> $par
#> [1] 0.05433545
#>
#> $value
#> [1] 3.85753e-28
#>
#> $counts
#> function gradient
#> 10000 NA
#>
#> $convergence
#> [1] 0
#>
#> $message
#> NULL
```

For this case, the best value of `span`

turns out to be `0.05433`

and the minimum SSE achieved is `3.85e-28`

.