Model Understanding >

ARIMA

ARIMA (Autoregressive Integrated Moving Average) is a widely used statistical technique for predicting future values in a time series. Basically, ARIMA assumes that future is weighted sum of past values.

ARIMA combines three components: AR (Autoregressive), I (Integrated), and MA (Moving Average). ARIMA is widely used as it can handle various types of time series data, including those with trends and seasonality.

AR(p)

AR (Autoregressive) model, specifically $AR(p)$, uses the previous (or, lagged) $p$ values to predict the current.

Let's say the current bitcoin price at time $t$ is $Y_t$. The $AR(p)$ model states that $Y_t = \mu + a_1 Y_{t-1} + a_2 Y_{t-2} + \cdots + a_p Y_{t-p} + \epsilon_t$, where $\mu$ is the mean of the process, and $\epsilon_t$ is the error term. In other words, the current value is a linear combination of the previous $p$ values.

For example, if $Y_t = 0.2 Y_{t-1} + \epsilon_t$ of $AR(1)$ process is shown below:

MA(q)

MA (Moving Average) model, specifically $MA(q)$, uses the previous $q$ errors to predict the current.

Let's say the current bitcoin price is $Y_t$. The MA(q) model states that $Y_t = \mu + \epsilon_t + b_1 \epsilon_{t-1} + b_2 \epsilon_{t-2} + \cdots + b_q \epsilon_{t-q}$, where $\mu$ is the mean of the process, and $\epsilon_t$ is the error term at time $t$. In other words, the current value is a linear combination of the previous $q$ errors. If we consider errors as shocks to the system, the MA(q) model describes how the shocks propagate through the system over time.

For example, if $Y_t = \epsilon_t + 0.2 \epsilon_{t-1}$ of $MA(1)$ process is shown below:

ARMA(p, q)

ARMA (Autoregressive Moving Average) model, specifically $ARMA(p, q)$, uses the previous $p$ values and the previous $q$ errors to predict the current value.

To make the explanation easy, let's define $B$, the back-shift operator, $B Y_t = Y_{t-1}$. Using the operator, we can define $AR(1)$ and $MA(1)$ as follows, when ignoring the mean $\mu$:

$ARMA(1, 1)$ is then $(1 - \phi B) Y_t = (1+\theta_1B)\epsilon_t$.

As an example, $ARMA(1, 1)$ of $(1-0.2B)Y_t = (1+0.3B)\epsilon_t$ means $Y_t = 0.2 Y_{t-1} + 0.3 \epsilon_{t-1} + \epsilon_{t}$.

ARIMA(p, d, q)

ARIMA (Autoregressive Integrated Moving Average) model, specifically $ARIMA(p, d, q)$, takes the difference of the time series $d$ times and then applies the $ARMA(p, q)$ model to the differenced series.

If $d=1$, the difference of the time series is defined as $(1-B)Y_t = Y_t - Y_{t-1}$, and $ARIMA(1, 1, 1)$ is then $(1-\phi B)(1-B)Y_t = (1+\theta_1B)\epsilon_t$.

Consider $(1-0.2B)(1-B)Y_t = (1+0.3B)\epsilon_t$. It can be expanded as:
$(Y_t - Y_{t-1}) - 0.2(Y_{t-1} - Y_{t-2}) = \epsilon_t + 0.3\epsilon_{t-1}$
$Y_t - Y_{t-1} = 0.2(Y_{t-1} - Y_{t-2}) + 0.3\epsilon_{t-1} + \epsilon_{t}$
$Y_t = Y_{t-1} + 0.2(Y_{t-1} - Y_{t-2}) + 0.3\epsilon_{t-1} + \epsilon_{t}$

An example chart of this process is shown below:

Difference is useful when the time series is not stationary. See stationary time series for more details.

Seasonal ARIMA (SARIMA)

SARIMA (Seasonal Autoregressive Integrated Moving Average) model, specifically $ARIMA(p, d, q)(P, D, Q)_s$ models a time series as combination of non-seasonality as $ARIMA(p, d, q)$ and seasonality as $ARIMA(P, D, Q)_s$.

Let's consider the simple form of $ARIMA(1, 1, 1)(1, 1, 1)_4$. $ARIMA(1, 1, 1)$ deals with $Y_t - Y_{t-1}$, while $(1, 1, 1)_4$ handles $Y_t - Y_{t-4}$.

As a result, it is defined as $(1-\phi_1 B)(1 - \Phi_1 B^4)(1-B)(1-B^4)Y_t$ $= (1+\theta_1B)(1+\Theta_1B^4)\epsilon_t$

ARIMA for Air Passengers

To demonstrate model performance, we show the model's prediction results for the air passengers dataset. The cross validation process identified the best transformation to make the time series stationary and the optimal hyperparameters. The Root Mean Squared Error on the next day's closing price was used to determine the best model.

In the chart, we display the model's predictions for last split of cross validation and test data.

  1. train: Training data of the last split.
  2. validation: Validation data of the last split.
  3. prediction (train, validation): Prdiction for train and validation data period. For each row (or a sliding window) of data, predictions are made for n days into the future (where n is set to 1, 2, 7). The predictions are then combined into a single series of dots. Since the accuracy of predictions decreases for large n, we see some hiccups in the predictions. The predictions from the tail of the train spills into the validation period as that's future from the 'train' data period viewpoint. These are somewhat peculiar settings, but it works well in testing if the model's predictions are good enough.
  4. test(input): Test input data.
  5. test(actual): Test actual data.
  6. prediction(test): The model's prediction given the test input. There's only one prediction from the last row (or the last sliding window) of the test input which corresponds to 1, 2, 7 days later after 'test(input)'.

ARIMA is supposed to perform very well for this kind of data as it's supposed to capture air passengers' cyclic, autoregressive, and moving average nature.