Techno Blender
Digitally Yours.

Time-Series Forecasting Based on Trend and Seasonal components | by Javier Fernandez | Jun, 2022

0 104


Analyzing the trend and seasonality of the time-series to decompose the time-series and implement forecasting models

Photo by Lukas on Pexels

Time-series forecasting is the task of making predictions based on time-stamped historical data. It involves building models to make observations and drive future decision-making in applications such as weather, engineering, economics, finance, or business forecasting, among others.

This article is intended as an introduction to time-series forecasting. The structure unfolds as follows. Firstly, a description of the two main patterns (trend and seasonality) of any time-series. Secondly, a decomposition of the time-series based on those patterns. Lastly, an implementation of a forecasting model called Holt-Winters’ Seasonal Method that is suitable for time-series data with the trend and/or seasonal components.

To cover all this content, I have created a dataset that simulates the temperatures of a northern hemisphere city such as Sevilla between 2010 and 2020. Both the synthetic dataset and the method used to create it are freely available to anyone interested. The code can be found in the following GitHub repository.

1. Importing the libraries and the data

Firstly, import the following libraries needed to run the code. Apart from the most typical libraries, the code is based on the functions provided by the statsmodels library, which provides classes and functions for estimating many different statistical models, such as statistical tests and forecasting models.

Here is the code to create the dataset. The data consist of two columns, one for the dates and the other for the temperature between 2010 and 2020.

2. Visualizing the dataset

Before we begin to analyze the patterns of the time-series, let’s visualize the data where each vertical dashed line corresponds to the start of the year.

Fig. 1. Temperature time-series. Ref: Image by author

Before moving on to the next section, let’s take a moment to look at the data. Interestingly, the data seem to have a seasonal variation as the temperature increases in winter and decreases in summer (southern hemisphere). Also, the temperature does not seem to increase significantly over time since the mean temperature has almost the same value regardless of the year.

3. Time-series patterns

Time-series forecasting models use mathematical equation(s) to find patterns in a series of historical data. These equations are then used to project into the future the historical time patterns in the data [1].

There are four types of time-series patterns:

  • Trend: Long-term increase or decrease in the data. The trend can be any function, such as linear or exponential, and can change direction over time.
  • Seasonality: Repeating cycle in the series with fixed frequencies (hour of the day, week, month, year, etc.). A seasonal pattern exists of a fixed known period.
  • Cyclicity: Occurs when the data rise and fall, but without a fixed frequency and duration caused, for example, by economic conditions.
  • Noise: The random variation in the series.

Most time-series data will contain one or more patterns, but probably not all of them. Here there are some examples where we can identify some of these time-series patterns:

  1. Annual Wikipedia audience (Left figure): In this figure, we can identify an increasing trend, as the audience increases linearly each year.
  2. Seasonality plot of US electricity usage (Middle figure): Each line corresponds to one year, so we can observe an annual seasonality as the consumption is repeated annually.
  3. Daily closing of the IBEX 35 (Right figure): This time-series has an increasing trend over time, as well as a cyclical pattern since there are some periods in which the IBEX 35 decreased due to economical reasons.
Fig. 2. From left to right, Wikipedia’s annual audience, seasonality plot of US electricity usage, IBEX 35 daily closings. Ref: From left to right, [3], [4], [5]

If we assume an additive decomposition for these patterns, we can write:

Y[t] = T[t] + S[t] + e[t]

where Y[t] is the data, T[t] is the trend-cycle component, S[t] is the seasonal component, and e[t] is the noise, all at period t.

On the other hand, a multiplicative decomposition would be written as:

Y[t] = T[t] *S[t] *e[t]

The additive decomposition is the most suitable one when the seasonal fluctuations do not vary with the level of the time-series. On the contrary, when the variation in the seasonal component appears to be proportional to the level of the time-series, then a multiplicative decomposition is more appropriate [2].

4. Decompose the data

A stationary time-series is defined as one whose properties do not depend on the time at which the series is observed. Thus, time-series with trends, or with seasonality, are not stationary whereas white noise series are stationary [6]. In a more mathematical sense, a time-series is said to be stationary if it has a constant mean and variance and the covariance is independent of time. In [6], you have different illustrative examples that compare stationary vs non-stationary time-series. In general, a stationary time-series will not have long-term predictable patterns.

But, why stationarity is important?

Well, stationarity has become a common assumption for many practices and tools in time-series analysis. These include trend estimation, forecasting, and causal inference, among others. Therefore, in many cases, you will need to determine if the data was generated by a stationary process and transform it to have the properties of a sample generated by that process [7].

But, how to check the stationary of time-series?

We can check stationary in two ways. On the one hand, we can check it manually by checking the mean and variance of the time-series. On the other hand, we can assess stationarity using a test function [8].

Some cases might be confusing. For example, a time-series without trend and seasonality but with cyclic behavior is stationary since the cycles are not of a fixed length.

4.1. Checking the trend

To analyze the trend and seasonality of the time-series, we first analyze the mean over time using the rolling mean method with a 30-day and 365-day windows.

Fig. 3. Rolling mean and std. Ref: Image by author.

In the figure, we can see how the rolling mean when using a 30-day window oscillates over time caused of the seasonality pattern of the data. Also, the rolling mean when using the 365-day window increases over time, indicating a slightly increasing trend over time.

This can also be assessed using several tests such as the Dickey-Fuller (ADF) and the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS):

  • The result of the ADF test (p-value below 0.05) suggests that the null hypothesis of the presence of a unit root can be rejected at a 95% confidence level. Hence, if the p-value is below 0.05, the time-series is stationary.
  • The result of the KPSS test (p-value above 0.05) suggests that the null hypothesis of the absence of a unit root presence of unit root cannot be rejected at a 95% confidence level. Hence, if the p-value is below 0.05, the time-series is not stationary.

Although these tests seem to be described to check the stationarity of the data, these are useful to analyze the trend of the time-series rather than the seasonality, as indicated in [9].

Results of Dickey-Fuller Test:
Test Statistic -3.69171446
p-value 0.00423122
Lags Used 30.00000000
Number of Observations Used 3621.00000000
Critical Value (1%) -3.43215722
Critical Value (5%) -2.86233853
Critical Value (10%) -2.56719507
dtype: float64
Results of KPSS Test:
Test Statistic 1.04843270
p-value 0.01000000
Lags Used 37.00000000
Critical Value (10%) 0.34700000
Critical Value (5%) 0.46300000
Critical Value (2.5%) 0.57400000
Critical Value (1%) 0.73900000
dtype: float64

Interestingly, the statistical results revealed an effect of the stationarity of the time-series. However, the null hypothesis for both tests is the opposite. While the ADF test indicates that the time-series is stationary (p-value > 0.05), the KPSS test reveals that it is not stationary (p-value > 0.05). This dataset was created with a slight trend, so results pinpoint that the KPSS test is more accurate for analyzing this dataset.

To reduce the trend of the dataset, we could implement the following detrending method:

Fig. 4. Rolling mean and std after detrending the time-series. Ref: Image by author.

4.2. Checking the seasonality

As observed before from the rolling std, there is a seasonal pattern within our time-series. Hence, we should implement a differencing method to remove the underlying seasonal or cyclical patterns in the time-series. Since the sample dataset has a 12-month seasonality, I used a difference of 365-lag difference:

Fig. 5. Rolling mean and std after differencing the time-series. Ref: Image by author.

Now, both the rolling mean and std remain more or less constant over time, so we have a stationary time-series.

The combined implementation of the detrending and differencing methods would be as follows:

Fig. 6. Rolling mean and std after detrending and differencing the time-series. Ref: Image by author.

4.3. Decomposition

The decomposition based on the mentioned patterns can be implemented with a useful Python function called seasonal_decompose within the ‘statsmodels’ package:

Fig. 7. Time-series decomposition. Ref: Image by author.

After looking at the four parts of decomposed graphs, we can say that there is a strong component of annual seasonality in our time-series, as well as an increasing trend pattern over time.

5. Modeling

The appropriate model for your time-series data will depend on the data’s particular characteristics, for example, if the dataset has an overall trend or seasonality. Please be sure to choose the model that best suits your data.

The appropriate model for your time-series data will depend on the particular characteristics of the data such as trend and seasonality [10]. Be sure to choose the model that best suits your data:

  1. Autoregression (AR)
  2. Moving Average (MA)
  3. Autoregressive Moving Average (ARMA)
  4. Autoregressive Integrated Moving Average (ARIMA)
  5. Seasonal Autoregressive Integrated Moving-Average (SARIMA)
  6. Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
  7. Vector Autoregression (VAR)
  8. Vector Autoregression Moving-Average (VARMA)
  9. Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
  10. Simple Exponential Smoothing (SES)
  11. Holt Winter’s Exponential Smoothing (HWES)

Since there was seasonality within our data, the implemented model has been the Holt-Winters’ Exponential Smoothing method as it is suitable for time-series data with the trend and/or seasonal components.

This method uses exponential smoothing to encode lots of values from the past and use them to predict “typical” values for the present and future. Exponential smoothing refers to the use of an exponentially weighted moving average (EWMA) to “smooth” a time-series [11].

Before implementing it, let’s create the training and testing datasets:

Here is the implementation using the root-mean-square error (RMSE) as the metric to assess the error of the model.

The Root Mean Squared Error of additive trend, additive seasonal of period season_length=365 and a Box-Cox transformation 6.27
Fig. 8. Results of the Holt-Winters’ Exponential Smoothing method. Ref: Image by author.

From the figure, we can observe how the model captures the seasonality and trend of the time-series, having an error in the prediction of the outliers.

6. Conclusion

Understanding the main time-series patterns and learning how to implement time-series forecasting models is essential due to their many applications.

Throughout this article, we have covered the trend and seasonality with a hands-on example based on a temperature dataset. Apart from checking the trend and seasonality, we have seen how to reduce it and how to create a basic model that uses these patterns to infer the temperature of the next few days.

From here, the next steps go towards understanding other forecasting models such as the ones listed in section 5. Here I leave two links [10, 12] to other articles that could be considered an extension of this article.


Analyzing the trend and seasonality of the time-series to decompose the time-series and implement forecasting models

Photo by Lukas on Pexels

Time-series forecasting is the task of making predictions based on time-stamped historical data. It involves building models to make observations and drive future decision-making in applications such as weather, engineering, economics, finance, or business forecasting, among others.

This article is intended as an introduction to time-series forecasting. The structure unfolds as follows. Firstly, a description of the two main patterns (trend and seasonality) of any time-series. Secondly, a decomposition of the time-series based on those patterns. Lastly, an implementation of a forecasting model called Holt-Winters’ Seasonal Method that is suitable for time-series data with the trend and/or seasonal components.

To cover all this content, I have created a dataset that simulates the temperatures of a northern hemisphere city such as Sevilla between 2010 and 2020. Both the synthetic dataset and the method used to create it are freely available to anyone interested. The code can be found in the following GitHub repository.

1. Importing the libraries and the data

Firstly, import the following libraries needed to run the code. Apart from the most typical libraries, the code is based on the functions provided by the statsmodels library, which provides classes and functions for estimating many different statistical models, such as statistical tests and forecasting models.

Here is the code to create the dataset. The data consist of two columns, one for the dates and the other for the temperature between 2010 and 2020.

2. Visualizing the dataset

Before we begin to analyze the patterns of the time-series, let’s visualize the data where each vertical dashed line corresponds to the start of the year.

Fig. 1. Temperature time-series. Ref: Image by author

Before moving on to the next section, let’s take a moment to look at the data. Interestingly, the data seem to have a seasonal variation as the temperature increases in winter and decreases in summer (southern hemisphere). Also, the temperature does not seem to increase significantly over time since the mean temperature has almost the same value regardless of the year.

3. Time-series patterns

Time-series forecasting models use mathematical equation(s) to find patterns in a series of historical data. These equations are then used to project into the future the historical time patterns in the data [1].

There are four types of time-series patterns:

  • Trend: Long-term increase or decrease in the data. The trend can be any function, such as linear or exponential, and can change direction over time.
  • Seasonality: Repeating cycle in the series with fixed frequencies (hour of the day, week, month, year, etc.). A seasonal pattern exists of a fixed known period.
  • Cyclicity: Occurs when the data rise and fall, but without a fixed frequency and duration caused, for example, by economic conditions.
  • Noise: The random variation in the series.

Most time-series data will contain one or more patterns, but probably not all of them. Here there are some examples where we can identify some of these time-series patterns:

  1. Annual Wikipedia audience (Left figure): In this figure, we can identify an increasing trend, as the audience increases linearly each year.
  2. Seasonality plot of US electricity usage (Middle figure): Each line corresponds to one year, so we can observe an annual seasonality as the consumption is repeated annually.
  3. Daily closing of the IBEX 35 (Right figure): This time-series has an increasing trend over time, as well as a cyclical pattern since there are some periods in which the IBEX 35 decreased due to economical reasons.
Fig. 2. From left to right, Wikipedia’s annual audience, seasonality plot of US electricity usage, IBEX 35 daily closings. Ref: From left to right, [3], [4], [5]

If we assume an additive decomposition for these patterns, we can write:

Y[t] = T[t] + S[t] + e[t]

where Y[t] is the data, T[t] is the trend-cycle component, S[t] is the seasonal component, and e[t] is the noise, all at period t.

On the other hand, a multiplicative decomposition would be written as:

Y[t] = T[t] *S[t] *e[t]

The additive decomposition is the most suitable one when the seasonal fluctuations do not vary with the level of the time-series. On the contrary, when the variation in the seasonal component appears to be proportional to the level of the time-series, then a multiplicative decomposition is more appropriate [2].

4. Decompose the data

A stationary time-series is defined as one whose properties do not depend on the time at which the series is observed. Thus, time-series with trends, or with seasonality, are not stationary whereas white noise series are stationary [6]. In a more mathematical sense, a time-series is said to be stationary if it has a constant mean and variance and the covariance is independent of time. In [6], you have different illustrative examples that compare stationary vs non-stationary time-series. In general, a stationary time-series will not have long-term predictable patterns.

But, why stationarity is important?

Well, stationarity has become a common assumption for many practices and tools in time-series analysis. These include trend estimation, forecasting, and causal inference, among others. Therefore, in many cases, you will need to determine if the data was generated by a stationary process and transform it to have the properties of a sample generated by that process [7].

But, how to check the stationary of time-series?

We can check stationary in two ways. On the one hand, we can check it manually by checking the mean and variance of the time-series. On the other hand, we can assess stationarity using a test function [8].

Some cases might be confusing. For example, a time-series without trend and seasonality but with cyclic behavior is stationary since the cycles are not of a fixed length.

4.1. Checking the trend

To analyze the trend and seasonality of the time-series, we first analyze the mean over time using the rolling mean method with a 30-day and 365-day windows.

Fig. 3. Rolling mean and std. Ref: Image by author.

In the figure, we can see how the rolling mean when using a 30-day window oscillates over time caused of the seasonality pattern of the data. Also, the rolling mean when using the 365-day window increases over time, indicating a slightly increasing trend over time.

This can also be assessed using several tests such as the Dickey-Fuller (ADF) and the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS):

  • The result of the ADF test (p-value below 0.05) suggests that the null hypothesis of the presence of a unit root can be rejected at a 95% confidence level. Hence, if the p-value is below 0.05, the time-series is stationary.
  • The result of the KPSS test (p-value above 0.05) suggests that the null hypothesis of the absence of a unit root presence of unit root cannot be rejected at a 95% confidence level. Hence, if the p-value is below 0.05, the time-series is not stationary.

Although these tests seem to be described to check the stationarity of the data, these are useful to analyze the trend of the time-series rather than the seasonality, as indicated in [9].

Results of Dickey-Fuller Test:
Test Statistic -3.69171446
p-value 0.00423122
Lags Used 30.00000000
Number of Observations Used 3621.00000000
Critical Value (1%) -3.43215722
Critical Value (5%) -2.86233853
Critical Value (10%) -2.56719507
dtype: float64
Results of KPSS Test:
Test Statistic 1.04843270
p-value 0.01000000
Lags Used 37.00000000
Critical Value (10%) 0.34700000
Critical Value (5%) 0.46300000
Critical Value (2.5%) 0.57400000
Critical Value (1%) 0.73900000
dtype: float64

Interestingly, the statistical results revealed an effect of the stationarity of the time-series. However, the null hypothesis for both tests is the opposite. While the ADF test indicates that the time-series is stationary (p-value > 0.05), the KPSS test reveals that it is not stationary (p-value > 0.05). This dataset was created with a slight trend, so results pinpoint that the KPSS test is more accurate for analyzing this dataset.

To reduce the trend of the dataset, we could implement the following detrending method:

Fig. 4. Rolling mean and std after detrending the time-series. Ref: Image by author.

4.2. Checking the seasonality

As observed before from the rolling std, there is a seasonal pattern within our time-series. Hence, we should implement a differencing method to remove the underlying seasonal or cyclical patterns in the time-series. Since the sample dataset has a 12-month seasonality, I used a difference of 365-lag difference:

Fig. 5. Rolling mean and std after differencing the time-series. Ref: Image by author.

Now, both the rolling mean and std remain more or less constant over time, so we have a stationary time-series.

The combined implementation of the detrending and differencing methods would be as follows:

Fig. 6. Rolling mean and std after detrending and differencing the time-series. Ref: Image by author.

4.3. Decomposition

The decomposition based on the mentioned patterns can be implemented with a useful Python function called seasonal_decompose within the ‘statsmodels’ package:

Fig. 7. Time-series decomposition. Ref: Image by author.

After looking at the four parts of decomposed graphs, we can say that there is a strong component of annual seasonality in our time-series, as well as an increasing trend pattern over time.

5. Modeling

The appropriate model for your time-series data will depend on the data’s particular characteristics, for example, if the dataset has an overall trend or seasonality. Please be sure to choose the model that best suits your data.

The appropriate model for your time-series data will depend on the particular characteristics of the data such as trend and seasonality [10]. Be sure to choose the model that best suits your data:

  1. Autoregression (AR)
  2. Moving Average (MA)
  3. Autoregressive Moving Average (ARMA)
  4. Autoregressive Integrated Moving Average (ARIMA)
  5. Seasonal Autoregressive Integrated Moving-Average (SARIMA)
  6. Seasonal Autoregressive Integrated Moving-Average with Exogenous Regressors (SARIMAX)
  7. Vector Autoregression (VAR)
  8. Vector Autoregression Moving-Average (VARMA)
  9. Vector Autoregression Moving-Average with Exogenous Regressors (VARMAX)
  10. Simple Exponential Smoothing (SES)
  11. Holt Winter’s Exponential Smoothing (HWES)

Since there was seasonality within our data, the implemented model has been the Holt-Winters’ Exponential Smoothing method as it is suitable for time-series data with the trend and/or seasonal components.

This method uses exponential smoothing to encode lots of values from the past and use them to predict “typical” values for the present and future. Exponential smoothing refers to the use of an exponentially weighted moving average (EWMA) to “smooth” a time-series [11].

Before implementing it, let’s create the training and testing datasets:

Here is the implementation using the root-mean-square error (RMSE) as the metric to assess the error of the model.

The Root Mean Squared Error of additive trend, additive seasonal of period season_length=365 and a Box-Cox transformation 6.27
Fig. 8. Results of the Holt-Winters’ Exponential Smoothing method. Ref: Image by author.

From the figure, we can observe how the model captures the seasonality and trend of the time-series, having an error in the prediction of the outliers.

6. Conclusion

Understanding the main time-series patterns and learning how to implement time-series forecasting models is essential due to their many applications.

Throughout this article, we have covered the trend and seasonality with a hands-on example based on a temperature dataset. Apart from checking the trend and seasonality, we have seen how to reduce it and how to create a basic model that uses these patterns to infer the temperature of the next few days.

From here, the next steps go towards understanding other forecasting models such as the ones listed in section 5. Here I leave two links [10, 12] to other articles that could be considered an extension of this article.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.

Leave a comment