GBM

GBM (Gradient Boosting Machine) is an ensemble of weak learners like decision trees. By adding trees one by one, while correcting the errors of the previous, it achieves high performance.

Success of GBM

GBM is an exceptionally effective model for both tabular data prediction and time series forecasting. Despite the advent of numerous deep learning models, GBM remains a top performer. The Kaggle 2023 AI report highlights this, noting the "continuing dominance of gradient boosted trees" and that "tabular data ... remains largely unaffected by the deep learning revolution".

Features to generate

Since GBM works with tabular data, it is essential to include lagged values, value ratios, and other derived features in each row. This ensures that each row contains both past and present values, enabling GBM training.

Predefined hyperparameter values

We use LightGBM, LGBMRegressor to be more specific. 'gbdt' (Gradient Boosting Decision Tree), early stopping, regression, and rmse were used. Other parameters such as max_depth, min_data_in_leaf, n_estimators, colsample_bytreem, etc. were tuned by grid search.

GBM for Air Passengers

To demonstrate model performance, we show the model's prediction results for the air passengers dataset. The cross validation process identified the best transformation to make the timeseries stationary and the optimal hyperparameters. The Root Mean Squared Error on the next day's closing price was used to determine the best model.

The chart below illustrates:

  1. train: training data
  2. prediction(train): the model's prediction for the training data periods
  3. test(input): test input data
  4. test(actual): test actual data
  5. prediction(test): the model's prediction on the selected days (1, 2, 7th days) of "test (actual)" periods given "test (input)".