Model Understanding >

Linear Tree

Linear tree is similar to decision tree, but it's different in that leaves are no longer constant values but linear models.

Why is decision tree insufficient?

Decision trees split the given data into subsets based on feature values to make predictions for each. For example, imagine a sine wave. A decision tree would split the sine wave into regions, and make predictions as the mean value of each region.

On the other hand, a linear tree that splits the sine wave into the same regions would make linear predictions for each, making it a better fit for time series forecasting for both training data periods and for the future.

See Linear trees in LightGBM: how to use? for a visual demonstration.

Linear tree for Air Passengers

To demonstrate model performance, we show the model's prediction results for the air passengers dataset. The cross validation process identified the best transformation to make the time series stationary and the optimal hyperparameters. The Root Mean Squared Error on the next day's closing price was used to determine the best model.

In the chart, we display the model's predictions for last split of cross validation and test data.

  1. train: Training data of the last split.
  2. validation: Validation data of the last split.
  3. prediction (train, validation): Prdiction for train and validation data period. For each row (or a sliding window) of data, predictions are made for n days into the future (where n is set to 1, 2, 7). The predictions are then combined into a single series of dots. Since the accuracy of predictions decreases for large n, we see some hiccups in the predictions. The predictions from the tail of the train spills into the validation period as that's future from the 'train' data period viewpoint. These are somewhat peculiar settings, but it works well in testing if the model's predictions are good enough.
  4. test(input): Test input data.
  5. test(actual): Test actual data.
  6. prediction(test): The model's prediction given the test input. There's only one prediction from the last row (or the last sliding window) of the test input which corresponds to 1, 2, 7 days later after 'test(input)'.

A linear tree exhibit interesting phonomenon of showing lagged predictions. This is due to linear lines dragging the previous time steps' pattern (of a line) to the current.