Forecasting with Granger Causality: Checking for Time Series Spurious Correlations | by Marco Cerliani | Apr, 2023

By Jessie Hobb On Apr 6, 2023

Hacking Granger Causality Test with ML Approaches

In time series forecasting is often helpful to inspect graphically the data at disposal. This helps us understand the dynamics of the phenomena we are analyzing and take decisions accordingly. Despite having a colorful plot with our time series may be fascinating, it may lead to incorrect conclusions. Time series are tricky because often unrelated events may still be visually seen to be related.

An example of spurious correlation [SOURCE]

As rational individuals, we can easily negate any sort of relationship between the number of people who died by becoming tangled in their bedsheets and per capita cheese consumption. We can affirm being in presence of a false (spurious) correlation because there is nothing that can relate the two events, even though we are not experts in both fields.

Those who work with data know that these patterns may happen often, also where we have difficulties interpreting the context and discriminating between true and wrong correlations. For this reason, the need for methodologies that help in discriminating against these situations is crucial.

One of the most famous techniques used to detect spurious correlation is the Granger causality test.

Granger-causality is built on the intuition that if a signal Y1 “Granger-causes” another signal Y2, then lags of Y1 (i.e. past observations) should contain information that helps predict Y2 together with the information contained in past observations of Y2.

Example of possible Granger-causality between time series [image by the author]

Testing for Granger causality doesn’t mean Y1 must be a cause for Y2. It simply means that past values of Y1 are good enough to improve the forecast of Y2’s future values. From this implication, we may derive a naive definition of causality.

The adoption of the Granger causality test implies strict assumptions on the underlying data (i.e. stationarity and linear dependency), which may be difficult to fulfill in real-world applications. For this reason, in this post, we propose a generalization of the Granger causality test adopting a simple machine learning approach that involves the usage of forecasting algorithms.

For the scope of this post, we simulate two different time series as a result of autoregressive processes.

Simulated AR processes [image by the author]

Both series are correlated with some of their past timesteps (autocorrelation).

Autocorrelation of AR processes [image by the author]

The time series exhibit an overall Pearson correlation of 0.637 with a discrete positive relationship preserved over time.

Pearson correlation of AR processes over time [image by the author]

At first sight, it seems we are in the presence of two events that have a positive connection. We express the correlation between two variables using the Pearson correlation coefficient. It is the most commonly used statistic to measure linear relationships between variables. It is so common that often people wrongly interpret it trying to give it a causal meaning. That may be a mistake! Person correlation only relates the mean and standard deviation of two variables. We can conclude anything about their dependency.

Person correlation formula [image by the author]

In our simulated scenario, the positive relationship is merely a mathematical result since we know the two series are related in only one direction. More precisely, past values of Y1 are linearly related to actual values of Y2 (vice-versa is not valid). Our scope is to make a practical demonstration of this statement.

Carrying out a Grange causality test, in a classical manner, means verify past values of a time series (Y1) have a statistically significant effect on the current values of another time series (Y2). This is done by running a linear model on the lagged series values.

The null hypothesis of the test states that the coefficients corresponding to past values of Y1 are zero. We reject the null hypothesis if the p-values are below a specific threshold. In that case, Y1 does not Granger cause Y2.

What if we operate the same check on the residuals of a predictive model?

In other words, we verify if adding past values of Y1 may improve the performance of a model which uses only lagged observation of the target (Y2).

As the first step, we fit two autoregressive models, on both Y1 and Y2, without additional exogenous variables and store the predictions obtained on test data.

forecaster = ForecastingCascade(
RandomForestRegressor(30, random_state=42, n_jobs=-1),
lags=lags,
use_exog=False,
)model_y1 = clone(forecaster).fit(None, df_train['y1'])
model_y2 = clone(forecaster).fit(None, df_train['y2'])
y1_pred = np.concatenate([
model_y1.predict(
[[0.]],
last_y=df['y1'].iloc[:i]
) for i in range(len(df_train), len(df_train) + len(df_test))
])
y2_pred = np.concatenate([
model_y2.predict(
[[0.]],
last_y=df['y2'].iloc[:i]
) for i in range(len(df_train), len(df_train) + len(df_test))
])

Secondly, we repeat the same forecasting procedure but add lagged exogenous variables (i.e. when forecasting Y1 we use past values of Y2 plus past values of Y1).

forecaster = ForecastingCascade(
make_pipeline(
FunctionTransformer(
lambda x: x[:,1:]  # remove current values of exog series
),
RandomForestRegressor(30, random_state=42, n_jobs=-1)
),
lags=lags,
use_exog=True,
exog_lags=lags,
)model_y1y2 = clone(forecaster).fit(df_train[['y2']], df_train['y1'])
model_y2y1 = clone(forecaster).fit(df_train[['y1']], df_train['y2'])
y1y2_pred = np.concatenate([
model_y1y2.predict(
pd.DataFrame({'y2': [0.]}),
last_y=df['y1'].iloc[:i],
last_X=df[['y2']].iloc[:i]
) for i in range(len(df_train), len(df_train) + len(df_test))
])
y2y1_pred = np.concatenate([
model_y2y1.predict(
pd.DataFrame({'y1': [0.]}),
last_y=df['y2'].iloc[:i],
last_X=df[['y1']].iloc[:i]
) for i in range(len(df_train), len(df_train) + len(df_test))
])

At the end of the forecasting phase, we store the predictions of 4 different models (two for forecasting Y1 and the other two for forecasting Y2). It’s time for a results comparison.

Squared residuals are computed at the sample level for all the prediction types. The distributions of the squared residuals are analyzed together for the same prediction target. We use the standard Kolmogorov-Smirnov test to check for distribution divergencies.

Comparison of squared residual distributions [image by the author]

The forecasts for Y1 appear to be the same with and without the addition of Y2’s features.

On the contrary, the forecasts of Y2 are significative different with and without the addition of Y1’s features. That means that Y1 has a positive impact in predicting Y2, i.e. Y1 Granger cause Y2 (the vice-versa is not true).

In this post, we proposed an alternative to the standard Granger causality test to verify causation dynamics in the time series domain. We didn’t stop looking only at the Pearson correlation coefficient to come to conclusions on the data. We analyzed, in an empirical way, the possible presence of reciprocal influences of events at our disposal spotting spurious relationships. The ease of use of the proposed methodology and its adaptability, with low assumptions, make it suitable to be adopted in any time series analytic journey.