Linear Tree

Linear tree is similar to decision tree, but it's different in that leaves are no longer constant values but linear models.

Why is decision tree insufficient?

Decision trees split the given data into subsets based on feature values to make predictions for each. For example, imagine a sine wave. A decision tree would split the sine wave into regions, and make predictions as the mean value of each region.

On the other hand, a linear tree that splits the sine wave into the same regions would make linear predictions for each, making it a better fit for time series forecasting for both training data periods and for the future.

See Linear trees in LightGBM: how to use? for a visual demonstration.

Linear tree for Air Passengers

To demonstrate model performance, we show the model's prediction results for the air passengers dataset. The cross validation process identified the best transformation to make the timeseries stationary and the optimal hyperparameters. The Root Mean Squared Error on the next day's closing price was used to determine the best model.

The chart below illustrates:

  1. train: training data
  2. prediction(train): the model's prediction for the training data periods
  3. test(input): test input data
  4. test(actual): test actual data
  5. prediction(test): the model's prediction on the selected days (1, 2, 3, 5, 7th days) of "test (actual)" periods given "test (input)".

A linear tree is supposed to output a straight line for the unobserved feature values. However, this is not the case here. This discrepancy arises because we scale the feature and target data as detailed in capturing trends.