Model Understanding >

ETS

ETS (Error, Trend, Seasonal) is a smoothing technique for time series forecasting. With the basic assumption that the best prediction at time $t+h$ is the last observed value at $t$, i.e., $\hat{y}_{t+h|T} = y_T$, it models the time series as a combination of trend, seasonal, and error components. See exponential smoothing and ETS in statsmodels for more details.

Level Component

Firstly, ETS starts from weighted average $\hat{y}_{T+h|T} = \alpha y_t + (1-\alpha) \hat{y}_{t-1}$, where $\alpha$ is the smoothing parameter. When representing this as component form, letting $\hat{y}_{t+h|t} = l_t$, $l_t = \alpha y_t + (1-\alpha) l_{t-1}$.

Trend

Holt’s linear trend method is used for the trend.

In Holt’s linear trend method, $\hat{y}_{t+h} = l_t + h b_t$, where $b_t$ is the trend component. It means that the prediction at time $t+h$ is the sum of the level at time $t$ and the trend component multipled by time gap $h$.

The trend is updated as $b_t = \beta (l_t - l_{t-1}) + (1-\beta) b_{t-1}$, meaning that the trend at time $t$ is a weighted average of the difference between the level at time $t$ and the level at time $t-1$, and the previous trend at time $t-1$.

Level is updated as $l_t = \alpha y_t + (1-\alpha)(l_{t-1} + b_{t-1})$ where $\alpha y_t$ is the same as before. $l_{t-1} + b_{t-1}$, representing the sum of previous level and previous trend, replaces $l_{t-1}$ (or, $\hat{y}_{t-1}$) in the previous level update equation.

Damped Trend

Damped trend is a variation of the trend component. It's used when the trend is expected to slow down over time or to avoid overshooting in the forecast. It's basically similar with $y=l+b$ form but there's now a damping parameter $\phi$ applied to the trend update equation, i.e., $b_t = \beta (l_t - l_{t-1}) + (1-\beta) \phi b_{t-1}$.

Seasonal

We're not going to cover seasonal component in this page, but it's basically the same as the trend component. Only thing that we consider is adding a seasonal component to the level update equation.

Additive vs Multiplicative

ETS has two types of models: additive and multiplicative. So far in the above, everything was being added, e.g., $y$ is sum of level and trend. On the other hand, multiplicative model would have the form of $y = l \times b$.

ETS for Air Passengers

To demonstrate model performance, we show the model's prediction results for the air passengers dataset. The cross validation process identified the best transformation to make the time series stationary and the optimal hyperparameters. The Root Mean Squared Error on the next day's closing price was used to determine the best model.

In the chart, we display the model's predictions for last split of cross validation and test data.

train: Training data of the last split.
validation: Validation data of the last split.
prediction (train, validation): Prdiction for train and validation data period. For each row (or a sliding window) of data, predictions are made for n days into the future (where n is set to 1, 2, 7). The predictions are then combined into a single series of dots. Since the accuracy of predictions decreases for large n, we see some hiccups in the predictions. The predictions from the tail of the train spills into the validation period as that's future from the 'train' data period viewpoint. These are somewhat peculiar settings, but it works well in testing if the model's predictions are good enough.
test(input): Test input data.
test(actual): Test actual data.
prediction(test): The model's prediction given the test input. There's only one prediction from the last row (or the last sliding window) of the test input which corresponds to 1, 2, 7 days later after 'test(input)'.