Building an Exercise Rep Counter Using Ideas from Signal Processing | by Aakash Agrawal | Jan, 2023

By Jessie Hobb On Jan 17, 2023

Designing a class-specific rep counter using a zero-crossing approach

photo by Karsten Winegeart on Unsplash.com

In this blog post, I discuss a very new and unique approach for building a real-time exercise rep counter, one that employs signal processing ideas on top of pose estimation. This approach can be easily adapted to build rep-counters for other classes.

From engaging fitness-related activities by counting the number of times a particular movement was done to measuring biological events (such as heartbeat and pulse count), Rep counting has countless applications, and research in this field has garnered a lot of traction in the last few years. In this blog, I discuss one of the approaches to building a fast and accurate rep counter using some trivial concepts in the Signal Processing domain.

I will swiftly cover the basics first and then walk through the signal-based formula for rep-counting. Implementation of the approach can be found here. Let’s start by discussing a few concepts from the world of signal processing which we could potentially leverage to build a rep-counter.

Zero-Crossing

A reference point where the mathematical function or a waveform crosses an intercept of the axis (intercept need not be 0). The term is usually used in electronics when referring to points in periodic voltages and currents where there is no signal.

Peak Detection

Detecting peaks (or positions) in a signal or a waveform where there is a sudden deviation (you observe spikes) from normal behavior. A technique to detect such deviation is by computing the z-score, which captures the mean and standard deviation of the signal to compute deviation.

At any position in the signal, the z-score algorithm essentially computes a trailing average and a trailing standard deviation of the preceding window of data points.
A peak in the signal can be identified by computing the range trailing average +/- (threshold * trailing standard deviation); if the current point has a value outside the range, then it’s considered part of an anomaly.

More details and mathematics of the algo can be found here.

A natural question that might come to a reader’s mind is how zero-crossing and peak detection algorithms can be used for rep-counting. Let’s see:

Assumptions

Consider a human/object to be composed of some set of keypoints (points of interest). For example, these keypoints can be human body joints like shoulders, hips, etc.

To simplify, I limit the problem to exercise rep-counting because of the ease of availability of keypoints in the human body (the idea can easily be extended to other objects where keypoints are readily available). We can use open-source pose-estimation models to compute the spatial location of body keypoints. I use Tensorflow’s Movenet pose estimation model for the purpose of illustration in this blog. This model is quite fast and accurate.

We assume any repetitive movement, for example, an exercise, as a set of sinusoidal waveforms of keypoints or functions (metrics) over keypoints. These Metrics include angles and distances between a combination of different body keypoints.

Algorithm

The underlying idea is to detect the Zero-Crossing points of these signal metrics in a moving temporal window in real-time.

Rep Counting using zero-crossing is a two-phase process:

Phase 1: Reference Computation

This phase is a one-time activity for a given exercise. We first find a zero-crossing line, aka reference line using a reference video (for exercise rep-counter, it can be a trainer’s video). Most of the steps will be common in the rep-counting phase.

a) We use the Movenet pose estimation model to observe human body keypoints in real-time. Consider the following reference:

Fig: Body keypoint estimation using the Movenet model. Trainer performing Jumping Jacks. GIF by Author.

b) We then compute the Metrics using a combination of different body keypoints. Metric can be distances or angles between keypoints. Some metric examples: left shoulder to left palm distance (euclidean/y-axis), the angle subtended at the left shoulder, etc.

The idea would be to use metrics that would cover a wide range of movements. I usually prefer doing exercises facing the front camera; hence, choosing euclidean and y-axis distance metrics suffices. If you wish to build the rep counter for side-facing exercises as well, you might need to consider x-axis distances as well. I also normalize the metrics by shoulder-to-shoulder distance so that the rep-counting doesn’t get affected by the distance from the camera.

c) Frame-level pose estimation results in a jitter in body keypoints which results in a jitter in the computed metrics. We use a low-pass filter to make metrics smooth and remove the jitter in the metric distances and angles, which makes the reference calculation and rep counting more accurate. More details on the technique can be found here. Ensure that the body keypoints are well within the frame before metrics computation.

d) Next, we filter out stationary metric signals. We compute the standard deviation of these signals and remove signals below a fixed threshold. If none of the metrics get filtered out, we use the top 3 metrics with the highest deviation. For the exercise rep counter, we consider a total of 18 metrics. For the above reference and a threshold standard deviation of 0.4, we end up with 8 metrics that contribute most to the repetitiveness.

e) Finally, we sum all remaining non-stationary metrics temporally and compute the reference line using the mean of the summed-up signal. We save id’s of these metrics and the reference line (the mean) in a config dictionary to be later used during rep-counting. The reference line for the trainer video:

Fig: Overall signal waveform for the reference video. Image by Author.

A careful look at the reference video illustrates that there are 6 reps in total. These reps actually correspond to the peaks observed in the overall signal above.

Phase 2: Rep Counting

Most of the steps in this phase are common to the reference computation phase.

a) Given a test video, we start by computing the keypoints and normalized metrics in real-time (same as the previous step).

b) We use the config dict from the reference computation phase to figure out the desired non-stationary metrics for this exercise and then sum up these metrics temporally to create a combined overall signal.

c) We create a fixed-size moving window in real-time and check for its intersection with the reference line, aka the zero-crossing line. For any repetitive movement, there are usually two states, one when going to the upstate of the exercise and the other when coming to the downstate, and either one of them is a normal state.

Hence, for a single rep, there are two intersections of the overall signal waveform with the reference line. The first intersection gives the idea that the person has reached upstate of the exercise, and the second intersection gives us the idea that the person is back to the normal state and the rep is complete.

That’s it!

Results

Let’s see the performance of the approach on a test video.

Fig: Rep Counting using the zero-crossing technique. GIF by Author.

The results look decent, right? 😎. Here is the overall signal waveform for the above test video. Intuitively, the four peaks correspond to the four reps.

Fig: Overall signal waveform for the test video. Image by Author.

Pros of the approach

The algorithm is quite fast and accurate.
The idea is instinctive, and the implementation is simple.
The approach is easily integrable in a production setting.

Cons of the approach

Not a generic rep-counter (however, the idea can be adapted for other classes).
Need to calculate the zero-crossing line using a reference video for each exercise/class; hence, difficult to scale to a big corpus of exercises.
In places the background is noisy, pose estimation might work poorly and hence result in poor rep-count results.
It might not work for exercises wherein the overall signal becomes flat due to different metrics eventually averaging out each other.

In this blog, I highlighted the zero-crossing idea to count the reps; however, another technique known as peak detection, which we briefly discussed at the start, can also be employed to detect reps in real time.

[1]. Usage of Low pass filters to make pose estimation more effective

[2]. Robust peak detection algorithm (using z-scores)

[3]. MoveNet: Ultrafast and accurate pose detection model

[4]. The implementation of the approach can be found here; please read the instructions in the repository for usage.