Anomaly Detection in Univariate Stochastic Time Series with Spectral Entropy | by Ning Jia


One tip to find regular patterns like sine waves from randomness.

Anomaly detection in time series data is a common task in data science. We treat anomalies as data patterns that exist not as expected. Today, let’s focus on detecting anomalies in a special univariate time series generated by a stochastic process.

The data should look noisy, chaotic, and random in those stochastic time series. Unexpected changes should be happening all the time. If the value is not changing or changing with a deterministic pattern, something is wrong with the data.

Let’s take a look at the following plot and you will see the labelled suspicious sections and understand why they should be detected as anomalies.

Four anomaly regions

The above plot uses synthetic data generated by the code I will show next. For customers’ privacy, I won’t disclose the actual use case that inspired this approach. I can try to give an example:

The path of a column of rising smoke may look bizarre and random depending on the varying wind conditions. It’s very unlikely you see repeating patterns like zigzags or long straight lines unless people manipulate the atmosphere.

I think you got the idea: we are trying to find certainties and patterns from a bunch of chaos and randomness, then those detected exceptions will be labelled as anomalies.

Generate synthetic data

The time-series data above is generated by a random walk process. Then I randomly add three sections of a sine wave with Gaussian noise and one area with a constant value.

First, generate the time series.

Then, add four anomaly regions randomly.

Anomaly detection with spectral entropy

The anomaly sections have different lengths, frequencies and amplitudes. Can we find those sections easily?

The answer is yes.

Simply computing a rolling spectral entropy will be a quick solution. As you can see in the plot, the rolling spectral entropy will be closed to zero for all the anomaly regions. We can use the rolling entropy as a continuous anomaly score indicating the likelihood of anomaly.

Why spectral entropy works?

The idea is to compute spectral density first, normalize it, and finally compute the Shannon Entropy.

Here I compare two examples of observations with the rolling window of 200. Above is a window in one anomaly region, and below is a window in normal regions.

Power Spectral Density (PSD) of an anomaly window
Power Spectral Density (PSD) of a good window

First, calculate the spectral density for each window and get the results shown on the right side.

Then you may think the frequency is a discrete variable with bins and treat the normalized density as the probability for each frequency bin value. Because the density is normalized, their sum should be equal to 1.

Now Shannon entropy can come to play.

the Shannon entropy of a distribution is the expected amount of information in an event drawn from that distribution. It gives a lower bound on the number of bits needed on average to encode symbols drawn from a distribution P.”

In the above anomaly window, the signal will be “active” at the frequency bin of 1Hz only (1Hz’s probability is almost 1, others are nearly 0). There are no surprises, no uncertainties; therefore, the entropy will be about 0 for this case.

The signal may be “active” at several frequency bins with different probabilities for the good region. We have more uncertainties or unknown information, so higher entropy will be.

Rolling window size and FFT size

You don’t always see the spectral entropies extremely close to zero for those sine-wave-like anomalies. The reason behind this is how we compute the spectral. There is a chance that the peak frequency may spread into two neighbouring frequency bins (spectral leakage due to frequency resolution). Then the Shannon entropy will not be close to zero but will still be smaller than the cases in a normal window.

The sampling rate and FFT size determine the frequency bins. If the FFT size is not specified, we will use the window size.

Above is the result with an FFT size of 512 and a window size of 100. We can still see the separation using the rolling spectral entropy; although the anomaly regions’ scores are smaller, they are not 0. We may need to do postprocessing like calculating the moving average to segment the time series into the anomaly and standard sections. With a smaller window size, we can detect the anomaly earlier.

In the actual application, you should explore and research the anomaly expected. Then you will pick the rolling window size or FFT size depending on your anomaly detection requirement.

Conclusion

Spectral entropy combines the idea of FFT, spectral density, and Shannon entropy. We can use it to check how much info contains in a window of time series data. Higher entropy implies uncertainty and randomness. Lower entropy indicates regular and deterministic patterns.

Therefore we can detect patterns from randomness using spectral entropy. Of course, spectral entropy works oppositely: detecting randomness from a series of patterned data. But frequency analysis maybe performs well enough for those cases already.

Thanks for reading.

Have fun with your time series.


One tip to find regular patterns like sine waves from randomness.

Anomaly detection in time series data is a common task in data science. We treat anomalies as data patterns that exist not as expected. Today, let’s focus on detecting anomalies in a special univariate time series generated by a stochastic process.

The data should look noisy, chaotic, and random in those stochastic time series. Unexpected changes should be happening all the time. If the value is not changing or changing with a deterministic pattern, something is wrong with the data.

Let’s take a look at the following plot and you will see the labelled suspicious sections and understand why they should be detected as anomalies.

Four anomaly regions

The above plot uses synthetic data generated by the code I will show next. For customers’ privacy, I won’t disclose the actual use case that inspired this approach. I can try to give an example:

The path of a column of rising smoke may look bizarre and random depending on the varying wind conditions. It’s very unlikely you see repeating patterns like zigzags or long straight lines unless people manipulate the atmosphere.

I think you got the idea: we are trying to find certainties and patterns from a bunch of chaos and randomness, then those detected exceptions will be labelled as anomalies.

Generate synthetic data

The time-series data above is generated by a random walk process. Then I randomly add three sections of a sine wave with Gaussian noise and one area with a constant value.

First, generate the time series.

Then, add four anomaly regions randomly.

Anomaly detection with spectral entropy

The anomaly sections have different lengths, frequencies and amplitudes. Can we find those sections easily?

The answer is yes.

Simply computing a rolling spectral entropy will be a quick solution. As you can see in the plot, the rolling spectral entropy will be closed to zero for all the anomaly regions. We can use the rolling entropy as a continuous anomaly score indicating the likelihood of anomaly.

Why spectral entropy works?

The idea is to compute spectral density first, normalize it, and finally compute the Shannon Entropy.

Here I compare two examples of observations with the rolling window of 200. Above is a window in one anomaly region, and below is a window in normal regions.

Power Spectral Density (PSD) of an anomaly window
Power Spectral Density (PSD) of a good window

First, calculate the spectral density for each window and get the results shown on the right side.

Then you may think the frequency is a discrete variable with bins and treat the normalized density as the probability for each frequency bin value. Because the density is normalized, their sum should be equal to 1.

Now Shannon entropy can come to play.

the Shannon entropy of a distribution is the expected amount of information in an event drawn from that distribution. It gives a lower bound on the number of bits needed on average to encode symbols drawn from a distribution P.”

In the above anomaly window, the signal will be “active” at the frequency bin of 1Hz only (1Hz’s probability is almost 1, others are nearly 0). There are no surprises, no uncertainties; therefore, the entropy will be about 0 for this case.

The signal may be “active” at several frequency bins with different probabilities for the good region. We have more uncertainties or unknown information, so higher entropy will be.

Rolling window size and FFT size

You don’t always see the spectral entropies extremely close to zero for those sine-wave-like anomalies. The reason behind this is how we compute the spectral. There is a chance that the peak frequency may spread into two neighbouring frequency bins (spectral leakage due to frequency resolution). Then the Shannon entropy will not be close to zero but will still be smaller than the cases in a normal window.

The sampling rate and FFT size determine the frequency bins. If the FFT size is not specified, we will use the window size.

Above is the result with an FFT size of 512 and a window size of 100. We can still see the separation using the rolling spectral entropy; although the anomaly regions’ scores are smaller, they are not 0. We may need to do postprocessing like calculating the moving average to segment the time series into the anomaly and standard sections. With a smaller window size, we can detect the anomaly earlier.

In the actual application, you should explore and research the anomaly expected. Then you will pick the rolling window size or FFT size depending on your anomaly detection requirement.

Conclusion

Spectral entropy combines the idea of FFT, spectral density, and Shannon entropy. We can use it to check how much info contains in a window of time series data. Higher entropy implies uncertainty and randomness. Lower entropy indicates regular and deterministic patterns.

Therefore we can detect patterns from randomness using spectral entropy. Of course, spectral entropy works oppositely: detecting randomness from a series of patterned data. But frequency analysis maybe performs well enough for those cases already.

Thanks for reading.

Have fun with your time series.

FOLLOW US ON GOOGLE NEWS

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.
Ai NewsanomalydetectionENTROPYJiamachine learningNingSeriesspectralStochasticTech NewsTimeUnivariate
Comments (0)
Add Comment