Techno Blender
Digitally Yours.

Understanding Time Series Trend. Deterministic trends vs stochastic… | by Vitor Cerqueira | Mar, 2023

0 45


Deterministic trends vs stochastic trends, and how to deal with them

Photo by Ali Abdul Rahman on Unsplash

Detecting and dealing with the trend is a key step in the modeling of time series.

In this article, we’ll:

  • Describe what is the trend of a time series, and its different characteristics;
  • Explore how to detect it;
  • Discuss ways of dealing with trend;

Trend as a building block of time series

At any given time, a time series can be decomposed into three parts: trend, seasonality, and the remainder.

Additive decomposition of a time series

The trend represents the long-term change in the level of a time series. This change can be either upward (increase in level) or downward (decrease in level). If the change is systematic in one direction, then the trend is monotonic.

USA GDP time series with an upward and monotonic trend. Data source in reference [1]. Image by author.

Trend as a cause of non-stationarity

A time series is stationary if its statistical properties do not change. This includes the level of the time series, which is constant under stationary conditions.

So, when a time series exhibits a trend, the stationarity assumption is not met. Modeling non-stationary time series is challenging. If untreated, statistical tests and forecasts can be misleading. This is why it’s important to detect and deal with the trend before modeling time series.

A proper characterization of the trend affects modeling decisions. This, further down the line, impacts forecasting performance.

Deterministic Trends

A trend can be either deterministic or stochastic.

Deterministic trends can be modeled with a well-defined mathematical function. This means that the long-term behavior of the time series is predictable. Any deviation from the trend line is only temporary.

In most cases, deterministic trends are linear and can be written as follows:

The equation for a linear trend. The coefficient b is the expected change in the trend in consecutive periods. The coefficient a is the intercept.

But, trends can also follow an exponential or polynomial form.

Exponential trend equation. This trend can be made linear by taking the log on both sides.

In the economy, there are several examples of time series that increase exponentially, such as GDP:

USA GDP time series. The original trend is exponential, but it becomes linear after the logarithm transformation. Data source in reference [1]. Image by author.

A time series with a deterministic trend is called trend-stationary. This means the series becomes stationary after removing the trend component.

Linear trends can also be modeled by including time as an explanatory variable. Here’s an example of how you could do this:

import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# https://github.com/vcerqueira/blog/blob/main/data/gdp-countries.csv
series = pd.read_csv('data/gdp-countries.csv')['United States']
series.index = pd.date_range(start='12/31/1959', periods=len(series), freq='Y')

log_gdp = np.log(series)

linear_trend = np.arange(1, len(log_gdp) + 1)

model = ARIMA(endog=log_gdp, order=(1, 0, 0), exog=linear_trend)
result = model.fit()

Stochastic Trends

A stochastic trend can change randomly, which makes their behavior difficult to predict.

A random walk is an example of a time series with a stochastic trend:

rw = np.cumsum(np.random.choice([-1, 1], size=1000))
A random walk time series whose trend changes suddenly and unpredictably. Image by author.

Stochastic trends are related to unit roots, integration, and differencing.

Time series with stochastic trends are referred to as difference-stationary. This means that the time series can be made stationary by differencing operations. Differencing means taking the difference between consecutive values.

Difference-stationary time series are also called integrated. For example, ARIMA (Auto-Regressive Integrated Moving Average) models contain a specific term (I) for integrated time series. This term involves applying differencing steps until the series becomes stationary.

Finally, difference-stationary or integrated time series are characterized by unit roots. Without going into mathematical details, a unit root is a characteristic of non-stationary time series.

Forecasting Implications

Deterministic and stochastic trends have different implications for forecasting.

Deterministic trends have a constant variance throughout time. In the case of a linear trend, this implies that the slope will not change. But, real-world time series show complex dynamics with the trend changing over long periods. So, long-term forecasting with deterministic trend models can lead to poor performance. The assumption of constant variance leads to narrow forecasting intervals that underestimate uncertainty.

Many realizations of a random walk. Image by author.

Stochastic trends are assumed to change over time. As a result, the variance of a time series increases across time. This makes stochastic trends better for long-term forecasting because they provide more reasonable uncertainty estimations.

Stochastic trends can be detected using unit root tests. For example, the augmented Dickey-Fuller test, or the KPSS test.

Augmented Dickey-Fuller (ADF) test

The ADF test checks whether an auto-regressive model contains a unit root. The hypotheses of the test are:

  • Null hypothesis: There is a unit root (the time series is not stationary);
  • Alternative hypothesis: There’s no unit root.

This test is available in statsmodels:

from statsmodels.tsa.stattools import adfuller

pvalue_adf = adfuller(x=log_gdp, regression='ct')[1]

print(pvalue_adf)
# 1.0

The parameter regression=‘ct’ is used to include a constant term and the deterministic trend in the model. As you can check in the documentation, there are four possible alternative values to this parameter:

  • c: including a constant term (default value);
  • ct: a constant term plus linear trend;
  • ctt: constant term plus a linear and quadratic trend;
  • n: no constant or trend.

Choosing which terms should be included is important. A wrong inclusion or exclusion of a term can substantially reduce the power of the test. In our case, we used the ct option because the log GPD series shows a linear deterministic trend behavior.

KPSS test

The KPSS test can also be used to detect stochastic trends. The test hypotheses are opposite relative to ADF:

Null hypothesis: the time series is trend-stationary;

Alternative hypothesis: There is a unit root.

from statsmodels.tsa.stattools import kpss

pvalue_kpss = kpss(x=log_gdp, regression='ct')[1]

print(pvalue_kpss)
# 0.01

The KPSS rejects the null hypothesis, while ADF doesn’t. So, both tests signal the presence of a unit root. Note that a time series can have a trend with both deterministic and stochastic components.

So, how can you deal with unit roots?

We’ve explored how to use time as an explanatory variable to account for a linear trend.

Another way to deal with trends is by differencing. Instead of working with the absolute values, you model how the time series changes in consecutive periods.

A single differencing operation is usually enough to achieve stationarity. Yet, sometimes you need to do this process many times. You can use ADF or KPSS to estimate the required number of differencing steps. The pmdarima library wraps this process in the function ndiffs:

from pmdarima.arima import ndiffs

# how many differencing steps are needed for stationarity?
ndiffs(log_gdp, test='adf')
# 2

In this case, the log GPD series needs 2 differencing steps for stationarity:

diff_log_gdp = log_gdp.diff().diff()
Second differences of the log GDP time series. Image by author.


Deterministic trends vs stochastic trends, and how to deal with them

Photo by Ali Abdul Rahman on Unsplash

Detecting and dealing with the trend is a key step in the modeling of time series.

In this article, we’ll:

  • Describe what is the trend of a time series, and its different characteristics;
  • Explore how to detect it;
  • Discuss ways of dealing with trend;

Trend as a building block of time series

At any given time, a time series can be decomposed into three parts: trend, seasonality, and the remainder.

Additive decomposition of a time series

The trend represents the long-term change in the level of a time series. This change can be either upward (increase in level) or downward (decrease in level). If the change is systematic in one direction, then the trend is monotonic.

USA GDP time series with an upward and monotonic trend. Data source in reference [1]. Image by author.

Trend as a cause of non-stationarity

A time series is stationary if its statistical properties do not change. This includes the level of the time series, which is constant under stationary conditions.

So, when a time series exhibits a trend, the stationarity assumption is not met. Modeling non-stationary time series is challenging. If untreated, statistical tests and forecasts can be misleading. This is why it’s important to detect and deal with the trend before modeling time series.

A proper characterization of the trend affects modeling decisions. This, further down the line, impacts forecasting performance.

Deterministic Trends

A trend can be either deterministic or stochastic.

Deterministic trends can be modeled with a well-defined mathematical function. This means that the long-term behavior of the time series is predictable. Any deviation from the trend line is only temporary.

In most cases, deterministic trends are linear and can be written as follows:

The equation for a linear trend. The coefficient b is the expected change in the trend in consecutive periods. The coefficient a is the intercept.

But, trends can also follow an exponential or polynomial form.

Exponential trend equation. This trend can be made linear by taking the log on both sides.

In the economy, there are several examples of time series that increase exponentially, such as GDP:

USA GDP time series. The original trend is exponential, but it becomes linear after the logarithm transformation. Data source in reference [1]. Image by author.

A time series with a deterministic trend is called trend-stationary. This means the series becomes stationary after removing the trend component.

Linear trends can also be modeled by including time as an explanatory variable. Here’s an example of how you could do this:

import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA

# https://github.com/vcerqueira/blog/blob/main/data/gdp-countries.csv
series = pd.read_csv('data/gdp-countries.csv')['United States']
series.index = pd.date_range(start='12/31/1959', periods=len(series), freq='Y')

log_gdp = np.log(series)

linear_trend = np.arange(1, len(log_gdp) + 1)

model = ARIMA(endog=log_gdp, order=(1, 0, 0), exog=linear_trend)
result = model.fit()

Stochastic Trends

A stochastic trend can change randomly, which makes their behavior difficult to predict.

A random walk is an example of a time series with a stochastic trend:

rw = np.cumsum(np.random.choice([-1, 1], size=1000))
A random walk time series whose trend changes suddenly and unpredictably. Image by author.

Stochastic trends are related to unit roots, integration, and differencing.

Time series with stochastic trends are referred to as difference-stationary. This means that the time series can be made stationary by differencing operations. Differencing means taking the difference between consecutive values.

Difference-stationary time series are also called integrated. For example, ARIMA (Auto-Regressive Integrated Moving Average) models contain a specific term (I) for integrated time series. This term involves applying differencing steps until the series becomes stationary.

Finally, difference-stationary or integrated time series are characterized by unit roots. Without going into mathematical details, a unit root is a characteristic of non-stationary time series.

Forecasting Implications

Deterministic and stochastic trends have different implications for forecasting.

Deterministic trends have a constant variance throughout time. In the case of a linear trend, this implies that the slope will not change. But, real-world time series show complex dynamics with the trend changing over long periods. So, long-term forecasting with deterministic trend models can lead to poor performance. The assumption of constant variance leads to narrow forecasting intervals that underestimate uncertainty.

Many realizations of a random walk. Image by author.

Stochastic trends are assumed to change over time. As a result, the variance of a time series increases across time. This makes stochastic trends better for long-term forecasting because they provide more reasonable uncertainty estimations.

Stochastic trends can be detected using unit root tests. For example, the augmented Dickey-Fuller test, or the KPSS test.

Augmented Dickey-Fuller (ADF) test

The ADF test checks whether an auto-regressive model contains a unit root. The hypotheses of the test are:

  • Null hypothesis: There is a unit root (the time series is not stationary);
  • Alternative hypothesis: There’s no unit root.

This test is available in statsmodels:

from statsmodels.tsa.stattools import adfuller

pvalue_adf = adfuller(x=log_gdp, regression='ct')[1]

print(pvalue_adf)
# 1.0

The parameter regression=‘ct’ is used to include a constant term and the deterministic trend in the model. As you can check in the documentation, there are four possible alternative values to this parameter:

  • c: including a constant term (default value);
  • ct: a constant term plus linear trend;
  • ctt: constant term plus a linear and quadratic trend;
  • n: no constant or trend.

Choosing which terms should be included is important. A wrong inclusion or exclusion of a term can substantially reduce the power of the test. In our case, we used the ct option because the log GPD series shows a linear deterministic trend behavior.

KPSS test

The KPSS test can also be used to detect stochastic trends. The test hypotheses are opposite relative to ADF:

Null hypothesis: the time series is trend-stationary;

Alternative hypothesis: There is a unit root.

from statsmodels.tsa.stattools import kpss

pvalue_kpss = kpss(x=log_gdp, regression='ct')[1]

print(pvalue_kpss)
# 0.01

The KPSS rejects the null hypothesis, while ADF doesn’t. So, both tests signal the presence of a unit root. Note that a time series can have a trend with both deterministic and stochastic components.

So, how can you deal with unit roots?

We’ve explored how to use time as an explanatory variable to account for a linear trend.

Another way to deal with trends is by differencing. Instead of working with the absolute values, you model how the time series changes in consecutive periods.

A single differencing operation is usually enough to achieve stationarity. Yet, sometimes you need to do this process many times. You can use ADF or KPSS to estimate the required number of differencing steps. The pmdarima library wraps this process in the function ndiffs:

from pmdarima.arima import ndiffs

# how many differencing steps are needed for stationarity?
ndiffs(log_gdp, test='adf')
# 2

In this case, the log GPD series needs 2 differencing steps for stationarity:

diff_log_gdp = log_gdp.diff().diff()
Second differences of the log GDP time series. Image by author.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment