Improve Time Series Forecasting performance with the Facebook Prophet model | by Satyam Kumar | Sep, 2022

By Jessie Hobb On Sep 7, 2022

Essential guide to time series feature engineering and forecasting

Time series forecasting involves model building on historical time-stamped data values and external factors to make scientific predictions that drive future strategic decision-making. Training a robust time-series forecasting model for accurate and reliable predictions is one of the most challenging tasks, given its direct impact on related decisions. The robustness of the time series forecasting model solely depends on the feature engineering and data analysis performed prior to modeling.

In one of the previous articles, I discussed an open-source package tsfresh that can generate hundreds of relevant features for your time series use-case.

Even after including tsfresh ad external features, sometimes the time-series model does not forecast to match the business expectations. In this article, we will discuss and implement, how to improve the performance of a supervised time-series model using features from the Facebook Prophet model.

Getting Started:

We will use a custom-generated sample time-based dataset with 8 independent features and a continuous dependent feature ‘target’. We will be training a Light-GBM model for different feature engineering strategies:

Light-GBM with external features
Light-GBM with external features + lags
Light-GBM with external features + lags + facebook prophet features

We will be implementing and comparing the performance of each of the above-mentioned feature engineering strategies and come to a conclusion on whether Facebook prophet features are effective in training a robust model.

Data:

The raw time-based data is time-based and has 8 independent features and ‘target’ as dependent features. I have created the hour, day, and month feature to capture the time factor in the data. Find below a sample of the data:

We will be training a light-GBM model on the above-mentioned raw sample data, and computing its top-performing features, and MAE (Mean Absolute Error) for benchmarking.

(Image by Author), **Left:** Plot visualizing real and predicted value for the inference data, **Right:** Top performing features for the Light GBM model

We are getting an MAE of 53.79 for the inference data, and the top-performing features are hour, day, humidity, etc.

The previous model was trained on only external factors/data and didn’t involve the lags of the dependent feature ‘target’. To compute the count of lags to be included we can observe the autocorrelation.

(Image by Author), Autocorrelation plot for dependent feature ‘target’

From the above correlation plot, we can observe that the ‘target’ feature has a high correlation for 1 hour, 24 hours, 48 hours, and so on. So we can create features computing lag features for:

1-hour lag variable by shifting the target value for 1 hour
1-day lag variable by shifting the target value for 1 day
2-day lag variable by shifting the target value for 2 day
1-week lag variable by shifting the target value for 1 week

We will be including the previous static features along with the lag features.

We are getting an MAE of 21.37 for the inference data, and the top-performing features are hour, previous hour lag, previous day lag, etc.

We will be now including features from the Facebook Prophet package. The idea is to train the FB Prophet model on training data, and utilize the training and inference features from the prophet.predict() API to generate the 22-dimension statistical features.

We are getting an MAE of 20.81 for the inference data, and the top-performing features are previous hour lag, current hour, weekly, previous day lag, additive features, etc.

In this article, we have discussed a few feature engineering strategies for time-series use cases. As per our experiment on a sample of data, the raw dataset gave an MAE of 53.78, after including the lags the MAE improved to 21.37. On introducing statistical features from the Facebook Prophet API, the MAE further improved to 20.81.

Although the improvement of MAE after including the FB Prophet feature was not enough, still it may perform well for other use cases or larger samples of real-world datasets.

[1] Facebook Prophet documentation: https://facebook.github.io/prophet/

Thank You for Reading