Techno Blender
Digitally Yours.

How Not to be Fooled by Time Series Models | by Michael Keith | Jun, 2022

0 82


Know when you are being presented with accurate-looking forecasts vs. when the forecast is actually highly accurate

Photo by Nicholas Cappello on Unsplash

It is easy to be tricked by time-series models. I have seen models that are able to (seemingly) predict the most random trends accurately, such as stock and crypto prices, using advanced techniques that most don’t fully understand. Is time series really like magic in this regard? Perform the right data manipulations, apply a complex-enough model, and presto, amazingly accurate predictions are produced for any date-indexed line into the future?

If you have seen the same things I’m describing and are skeptical, you are right to feel that way. What is usually happening is that an analyst is passing off a string of one-step forecasts that appear accurate on a validation slice of data. The analyst concludes that the model they created is powerful enough to predict many periods into the future accurately. This can happen knowingly or unknowingly — if I had to guess, I would say most analysts, who usually see examples of models being built on cross-sections of data, never think about what adding the time element to a dataset might do to their models. Time series is different from other machine learning and extra precaution has to be taken or results will be misleading.

A series’ present is usually highly correlated with its past. This is referred to as a series’ auto-regressive (AR) property, or auto-correlation. Specifically, any given value is probably pretty close to the value that came right before it.

The implications of this fact are deeper than they appear at first. Let’s consider a linear model that has an autoregressive component, a trend component, and a seasonal component, such that:

Image by author

This function says that the previous value (x_{t-1}) multiplied by a factor of alpha, the series’ seasonality (s_t), the series’ trend (t_t), and an error term (e_t) determine the next value in the series. Let’s say you want to apply this model to real data, specifically to predict 10 months into the future. You are given 100 monthly observations to do so:

Image by author
Image by author

We have the data, so let’s prepare the model. To obtain the first term from the listed equation, the auto-regressive term, we can use the shift() method from pandas:

data['AR1'] = data['Target'].shift()

For seasonality, instead of using the month value as is, we can use a Fourier transformation, which models seasonality with wave functions:

data['MonthSin'] = data['Month'].apply(
lambda x: np.sin(np.pi * x/(12/2))
)
data['MonthCos'] = data['Month'].apply(
lambda x: np.cos(np.pi * x/(12/2))
)

Finally, we can use the year variable already in the dataset to model the trend.

Now, we have all three components we need: the previous series’ value (x_{t-1}), its trend (t_t), and its seasonality (s_t). We assume the error term (e_t) is random and cannot be modeled. So, let’s write a normal machine-learning pipeline and see what happens.

data_processed = data[
[
'Target',
'AR1',
'MonthSin',
'MonthCos',
'Year'
]
].dropna()
data_processed.head()
Image by author

We use scikit-learn to split our data, such that the last 10 observations are in the test set.

y = data_processed['Target']
X = data_processed.drop('Target',axis=1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=10, shuffle=False
)

The rest is a piece of cake. Train the linear model:

lr = LinearRegression()
lr.fit(X_train,y_train)

Make predictions on the unseen test data:

pred = lr.predict(X_test)

Finally, evaluate the error:

mean_absolute_error(y_test,pred)

This returns a value of 3.8. We can even plot the results.

sns.lineplot(
y='Target',
x='Date',
data=data.reset_index(),
label='actuals',
)
sns.lineplot(
y=pred,
x=data.index[-10:],
label='predictions',
)
plt.rcParams["figure.figsize"] = (8,6)
plt.show()
Image by author

Looks pretty good! Let’s say you show this model to your supervisor and they agree that the model is accurate and should be implemented. But wait, a problem. The model heavily relies on the previous series’ value to make predictions about the next value. How do we then extend this model to the future if we don’t know the previous value for all ten steps?

One way to do this is to extend our equation so that the linear model can be dynamically evaluated using a multi-step process. Consider that in the original equation, instead of using the actual past value as an input, we use a prediction of the past value, such that the second forecast step can be written as:

Image by author

More generally, we can say that:

Image by author

Where j is any period over the forecast horizon. That can be a solid way to extend your forecast into future periods. Problem solved. Except, we run into two problems:

  1. It is a lot of extra work to code a forecast-evaluation procedure that does this.
  2. Our model wasn’t tested this way, so the test-set metric might be misleading.

We might not think point 2 is a big deal, but actually, as we will see, it is a huge deal. Because in the test set we knew the value for AR1 for all 10 predictions we made, the error performance we reported is massively inflated (or you could say deflated, since smaller is better). Our graph is also very misleading. The model is not actually that accurate, at least not for 10 steps.

We can demonstrate this effectively using the scalecast package. We will use the same data and add the same terms to the Forecaster object:

from scalecast.Forecaster import Forecasterf = Forecaster(y=data['Target'],current_dates=data.index)f.set_test_length(10)
f.generate_future_dates(10)
f.add_seasonal_regressors('month',sincos=True,raw=False)
f.add_seasonal_regressors('year')
f.add_ar_terms(1)

We then call the model. But, we are going to call it two different ways. The first way is the default method from scalecast, where the test-set predictions are made with the dynamic-forecasting procedure we already overviewed. The value for AR1 is not known in the test set anymore, but instead, it is predicted for 9 out of the 10 steps (in step 1, we use the actual value since we know what it is). The second model will be exactly the same as the one we evaluated without scalecast, non-dynamically evaluated, where we always knew the value for AR1 in the test set:

f.set_estimator('mlr')
f.manual_forecast(call_me='mlr_dynamic')
f.manual_forecast(call_me='mlr_non-dynamic',dynamic_testing=False)

We then plot the results:

f.plot_test_set()
plt.rcParams["figure.figsize"] = (8,6)
plt.show()
Image by author

The red line in the graph above is the model we evaluated before, same trend, same predicted values. The orange line are the results we would have obtained with the same inputs, but instead, dynamically testing the model. Quite a bit of difference! So much so, in fact, that the test MAE falls to 9.1, or almost 2.5 times worse!

f.export('model_summaries')[['ModelNickname','TestSetMAE']]
Image by author

Whoopsies! Now we need to explain what happened to our supervisor.

What are the implications of this? Well, I can run this same model with the same inputs and seemingly predict anything, and I mean anything, if I don’t dynamically test the error term. Below, the plots displayed show a red line, which is a non-dynamically evaluated forecast, and an orange line that shows a fairer representation of the model on the full test set — a true out-of-sample forecast. All applied models are of the multiple linear regression class.

S&P 500

Here are my predictions for the S&P 500 only training on data from slightly after 2021 began:

Image by author

I could have predicted its precipitous rise, as well as almost the exact moment it started to fall.

Bitcoin

In December 2021, I could have predicted the recent bitcoin crash:

Image by author

COVID-19

By using airline passengers from a major airport as a proxy, I could have predicted the fall-off of air travel from the COVID-19 pandemic, as well as the subsequent recovery, back in 2015!!

Image by author

Remember, all of those models were trained on one dataset and validated on a slice of data they had never seen before. Therefore, I can only conclude that I have stumbled upon one of the world’s most powerful models. Nice!

So, obviously, I couldn’t have predicted any of these things. But how often do we see results just like this where analysts are trying to pass off highly accurate models when really, nothing more than a string of one-step predictions are being presented? A lot of those models go un-challenged because they are built with advanced machine-learning algorithms, or by analysts who don’t know better. No matter how fancy they seem, no matter how eloquently they are explained, advanced models, including recurrent neural networks and LSTMS, are not magic. They cannot predict the unpredictable. That doesn’t mean there aren’t highly intelligent people doing incredible work with them, it just means when they are applied incorrectly, they become highly misleading.

So, now we know what not to do. In part 2, I will write about validation techniques that can help us build highly effective time-series models that will not deceive. They will not look as accurate on the test set but they will be better at predicting the future, which is what we actually care about when forecasting. So subscribe and sign up for email notifications if that interests you. You can take a look at the notebook I will be using in that article. All data used in this analysis was made up, obtained through public APIs, or available with an MIT license. See here for the notebook I created.


Know when you are being presented with accurate-looking forecasts vs. when the forecast is actually highly accurate

Photo by Nicholas Cappello on Unsplash

It is easy to be tricked by time-series models. I have seen models that are able to (seemingly) predict the most random trends accurately, such as stock and crypto prices, using advanced techniques that most don’t fully understand. Is time series really like magic in this regard? Perform the right data manipulations, apply a complex-enough model, and presto, amazingly accurate predictions are produced for any date-indexed line into the future?

If you have seen the same things I’m describing and are skeptical, you are right to feel that way. What is usually happening is that an analyst is passing off a string of one-step forecasts that appear accurate on a validation slice of data. The analyst concludes that the model they created is powerful enough to predict many periods into the future accurately. This can happen knowingly or unknowingly — if I had to guess, I would say most analysts, who usually see examples of models being built on cross-sections of data, never think about what adding the time element to a dataset might do to their models. Time series is different from other machine learning and extra precaution has to be taken or results will be misleading.

A series’ present is usually highly correlated with its past. This is referred to as a series’ auto-regressive (AR) property, or auto-correlation. Specifically, any given value is probably pretty close to the value that came right before it.

The implications of this fact are deeper than they appear at first. Let’s consider a linear model that has an autoregressive component, a trend component, and a seasonal component, such that:

Image by author

This function says that the previous value (x_{t-1}) multiplied by a factor of alpha, the series’ seasonality (s_t), the series’ trend (t_t), and an error term (e_t) determine the next value in the series. Let’s say you want to apply this model to real data, specifically to predict 10 months into the future. You are given 100 monthly observations to do so:

Image by author
Image by author

We have the data, so let’s prepare the model. To obtain the first term from the listed equation, the auto-regressive term, we can use the shift() method from pandas:

data['AR1'] = data['Target'].shift()

For seasonality, instead of using the month value as is, we can use a Fourier transformation, which models seasonality with wave functions:

data['MonthSin'] = data['Month'].apply(
lambda x: np.sin(np.pi * x/(12/2))
)
data['MonthCos'] = data['Month'].apply(
lambda x: np.cos(np.pi * x/(12/2))
)

Finally, we can use the year variable already in the dataset to model the trend.

Now, we have all three components we need: the previous series’ value (x_{t-1}), its trend (t_t), and its seasonality (s_t). We assume the error term (e_t) is random and cannot be modeled. So, let’s write a normal machine-learning pipeline and see what happens.

data_processed = data[
[
'Target',
'AR1',
'MonthSin',
'MonthCos',
'Year'
]
].dropna()
data_processed.head()
Image by author

We use scikit-learn to split our data, such that the last 10 observations are in the test set.

y = data_processed['Target']
X = data_processed.drop('Target',axis=1)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=10, shuffle=False
)

The rest is a piece of cake. Train the linear model:

lr = LinearRegression()
lr.fit(X_train,y_train)

Make predictions on the unseen test data:

pred = lr.predict(X_test)

Finally, evaluate the error:

mean_absolute_error(y_test,pred)

This returns a value of 3.8. We can even plot the results.

sns.lineplot(
y='Target',
x='Date',
data=data.reset_index(),
label='actuals',
)
sns.lineplot(
y=pred,
x=data.index[-10:],
label='predictions',
)
plt.rcParams["figure.figsize"] = (8,6)
plt.show()
Image by author

Looks pretty good! Let’s say you show this model to your supervisor and they agree that the model is accurate and should be implemented. But wait, a problem. The model heavily relies on the previous series’ value to make predictions about the next value. How do we then extend this model to the future if we don’t know the previous value for all ten steps?

One way to do this is to extend our equation so that the linear model can be dynamically evaluated using a multi-step process. Consider that in the original equation, instead of using the actual past value as an input, we use a prediction of the past value, such that the second forecast step can be written as:

Image by author

More generally, we can say that:

Image by author

Where j is any period over the forecast horizon. That can be a solid way to extend your forecast into future periods. Problem solved. Except, we run into two problems:

  1. It is a lot of extra work to code a forecast-evaluation procedure that does this.
  2. Our model wasn’t tested this way, so the test-set metric might be misleading.

We might not think point 2 is a big deal, but actually, as we will see, it is a huge deal. Because in the test set we knew the value for AR1 for all 10 predictions we made, the error performance we reported is massively inflated (or you could say deflated, since smaller is better). Our graph is also very misleading. The model is not actually that accurate, at least not for 10 steps.

We can demonstrate this effectively using the scalecast package. We will use the same data and add the same terms to the Forecaster object:

from scalecast.Forecaster import Forecasterf = Forecaster(y=data['Target'],current_dates=data.index)f.set_test_length(10)
f.generate_future_dates(10)
f.add_seasonal_regressors('month',sincos=True,raw=False)
f.add_seasonal_regressors('year')
f.add_ar_terms(1)

We then call the model. But, we are going to call it two different ways. The first way is the default method from scalecast, where the test-set predictions are made with the dynamic-forecasting procedure we already overviewed. The value for AR1 is not known in the test set anymore, but instead, it is predicted for 9 out of the 10 steps (in step 1, we use the actual value since we know what it is). The second model will be exactly the same as the one we evaluated without scalecast, non-dynamically evaluated, where we always knew the value for AR1 in the test set:

f.set_estimator('mlr')
f.manual_forecast(call_me='mlr_dynamic')
f.manual_forecast(call_me='mlr_non-dynamic',dynamic_testing=False)

We then plot the results:

f.plot_test_set()
plt.rcParams["figure.figsize"] = (8,6)
plt.show()
Image by author

The red line in the graph above is the model we evaluated before, same trend, same predicted values. The orange line are the results we would have obtained with the same inputs, but instead, dynamically testing the model. Quite a bit of difference! So much so, in fact, that the test MAE falls to 9.1, or almost 2.5 times worse!

f.export('model_summaries')[['ModelNickname','TestSetMAE']]
Image by author

Whoopsies! Now we need to explain what happened to our supervisor.

What are the implications of this? Well, I can run this same model with the same inputs and seemingly predict anything, and I mean anything, if I don’t dynamically test the error term. Below, the plots displayed show a red line, which is a non-dynamically evaluated forecast, and an orange line that shows a fairer representation of the model on the full test set — a true out-of-sample forecast. All applied models are of the multiple linear regression class.

S&P 500

Here are my predictions for the S&P 500 only training on data from slightly after 2021 began:

Image by author

I could have predicted its precipitous rise, as well as almost the exact moment it started to fall.

Bitcoin

In December 2021, I could have predicted the recent bitcoin crash:

Image by author

COVID-19

By using airline passengers from a major airport as a proxy, I could have predicted the fall-off of air travel from the COVID-19 pandemic, as well as the subsequent recovery, back in 2015!!

Image by author

Remember, all of those models were trained on one dataset and validated on a slice of data they had never seen before. Therefore, I can only conclude that I have stumbled upon one of the world’s most powerful models. Nice!

So, obviously, I couldn’t have predicted any of these things. But how often do we see results just like this where analysts are trying to pass off highly accurate models when really, nothing more than a string of one-step predictions are being presented? A lot of those models go un-challenged because they are built with advanced machine-learning algorithms, or by analysts who don’t know better. No matter how fancy they seem, no matter how eloquently they are explained, advanced models, including recurrent neural networks and LSTMS, are not magic. They cannot predict the unpredictable. That doesn’t mean there aren’t highly intelligent people doing incredible work with them, it just means when they are applied incorrectly, they become highly misleading.

So, now we know what not to do. In part 2, I will write about validation techniques that can help us build highly effective time-series models that will not deceive. They will not look as accurate on the test set but they will be better at predicting the future, which is what we actually care about when forecasting. So subscribe and sign up for email notifications if that interests you. You can take a look at the notebook I will be using in that article. All data used in this analysis was made up, obtained through public APIs, or available with an MIT license. See here for the notebook I created.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment