Seasonality of Time Series. An intuition of how seasonality affects… | by Egor Howell | Oct, 2022

By Jessie Hobb On Oct 26, 2022

An intuition of how seasonality affects time series analysis

Seasonality is a crucial aspect of time-series analysis. As time-series are indexed forward in time, they are subject to seasonal fluctuations. For example, we expect ice cream sales to be higher in the summer months and lower in the winter months.

Seasonality can come in different time intervals such as days, weeks or months. The key for time-series analysis is to understand how the seasonality affects our series, therefore making us produce better forecasts for the future.

In this post we will go over an example of seasonal data and then show how we can remove it. The reason we want to remove it is to make our time-series stationary, which is a requirement by most forecasting models. If you want to learn more about stationarity, checkout my previous posts here:

We can observe seasonality in the plot below of US air passenger volumes between 1948–1960:

Data sourced from Kaggle with a CC0 licence.

Code Gist by author.

The data is indexed by month and we can clearly see a yearly seasonal pattern where the number of passengers peaks in the summer months. There is also the overrall trend of the number of passengers increasing through time.

We can remove seasonality in the data using differencing, which calculates the difference between the current value and its value in the previous season. The reason this is done is to make the time series stationary rendering its statistical properties constant through time. Seasonality causes the mean of the time series to be different when we are in a particular season. Hence, its statistical properties are not constant.

Seasonal differencing is mathematically described as:

Equation generated by author in LaTeX.

Where d(t) is the differenced data point at time t, y(t) is the value of the series at t, y(t-m) is the value of the data point at the previous season and m is the length of one season. In our case m=12 as we have yearly seasonality.

We can use the pandas diff() method to calculate the seasonal differences and plot the resultant series:

Code Gist by author.

The yearly seasonality has disappeared now, however we now observe some cycle. This is another common feature time series which is similar to seasonality but are typically on a longer timescale as observed here.

We can test that the resultant series is stationary using the Augmented Dickey-Fuller (ADF) test. The null hypothesis of this test is that the series is non-stationary. The statsmodels package provides a function for carrying out the ADF test:

Code Gist by author.

Output:

ADF Statistic:  -3.3830207264924805
P-Value:  0.011551493085514982
Critical Values:
1%: -3.48
5%: -2.88
10%: -2.58

The P-Value is lower than the 5% and 10% threshold, but higher than the 1% threshold. Therefore, depending on your significance level we can either statistically confirm or deny that our series is stationary.

We can also carry out some further regular differencing (difference between adjacent values) to further reduce the P-Value. However, in this case I think the data is adequately stationary given it is below the 5% threshold.

It is also best practise to stabilise the variance as that is one of the conditions of stationarity. To achieve this, we could have used the Box Cox transform. If you want to learn more about stabilising the variance, checkout my previous article on it:

In this article we have shown what seasonality is and how it looks like as. We can remove seasonality through differencing and confirm whether the resultant series is stationary using the ADF test.

The full Python script for this article can be found at my GitHub here:

An intuition of how seasonality affects time series analysis

We can observe seasonality in the plot below of US air passenger volumes between 1948–1960:

Data sourced from Kaggle with a CC0 licence.

Code Gist by author.

Seasonal differencing is mathematically described as:

Equation generated by author in LaTeX.

We can use the pandas diff() method to calculate the seasonal differences and plot the resultant series:

Code Gist by author.

Output:

ADF Statistic:  -3.3830207264924805
P-Value:  0.011551493085514982
Critical Values:
1%: -3.48
5%: -2.88
10%: -2.58

In this article we have shown what seasonality is and how it looks like as. We can remove seasonality through differencing and confirm whether the resultant series is stationary using the ADF test.

The full Python script for this article can be found at my GitHub here:

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – [email protected]. The content will be deleted within 24 hours.