Machine Learning for Jiu Jitsu. Using pose estimation with mediapipe to… | by Lucas Soares

Photo by Kampus Production from Pexels: https://www.pexels.com/photo/a-judoka-throwing-an-opponent-to-the-ground-6765024/

Using pose estimation with mediapipe to track Jiu Jitsu movements

Brazilian Jiu-Jitsu is a martial art that has been getting a lot of popularity recently due to its effectiveness and applicability in real-world combat.

I’ve been practicing Brazilian Jiu Jitsu for over 10 years and I decided to join my interests in martial arts and machine learning to come up with a project that lived at the intersection of these 2 really interesting fields.

Therefore, I turned to pose estimation as a promising technique for a complimentary tool to aid with my development in Jiu Jitsu.

In this article, I would like to share with you how to use pose tracking to enhance feedback correction when performing fighting movements.

Pose tracking is the process of detecting and tracking the movement of a person’s body in real-time using computer vision technology. It involves using algorithms to capture and interpret the movement of various body parts, such as the arms, legs, and torso.

This technique can be relevant for analyzing body movement in sports, as it allows coaches and athletes to identify and correct movement patterns that may be negatively impacting performance or causing injuries.

By providing real-time feedback, athletes can make adjustments to their technique, leading to improved performance and reduced risk of injury. Additionally, this technology can be used to compare movements to those of top performers in the sport for example, to help beginners identify areas for improvement and refine their technique accordingly.

Jiu Jitsu is a martial art centered around the idea of subduing opponents through a combination of pins and submission holds like joint locks and chokes.

Jiu Jitsu focuses on grappling and ground fighting techniques. It was initially developed in Japan and later modified and popularized in Brazil. But now, it has spread all over the world due to its increase in popularity particularly in the United States.

The basic principle is that a smaller, weaker person can defend against a larger, stronger opponent by using leverage and technique. Practitioners aim to control their opponent’s body and position themselves in a dominant position where they can execute techniques such as chokes, joint locks, and throws.

Image by Timoth Eberly in https://unsplash.com/photos/7MRajrPiTqw*

Jiu Jitsu is now a popular sport and self-defense system practiced all over the world. It requires physical and mental discipline, as well as a willingness to learn and adapt.

It has also been found to have numerous benefits, including improved physical fitness and mental acuity, increased confidence and self-esteem, as well as stress relief.

The large emphasis on technique makes this martial art quite unique, and in the context of a jiu jitsu gym, it is usually the role of the black belt coach to give feedback to the student regarding the appropriateness of his or her execution of different techniques.

However, it’s often the case that people want to learn but either don’t have the access to an expert, or the class contains too many students and it becomes difficult for the person conducting the class to give specific and personal feedback regarding whether or not the student is performing the movements correctly.

Within this gap of feedback is that I think tools like pose tracking can greatly benefit the world of martial arts in general and Jiu Jitsu in particular (although one can argue the same for Judo, wrestling, and striking-based martial arts as well), because they could be seamlessly integrated into a smartphone only requiring athletes to film themselves while performing the movement they are trying to improve.

The form of this feedback is something that would have to be developed, and this article is an attempt to provide directions for how such a machine learning based feedback system would work for helping students get better at performing foundational movements in the sport.

Ok, so here is the story.

Usually, when you develop your Jiu Jitsu skills, you end up falling under one of 2 categories: bottom player, or top player. That means whether you tend to play from the bottom using your “guard” (a reference to the usage of the legs to perform attacks on the opponent) or from the top by first taking down your opponent and then proceeding to pass the line of their legs to (usually) reach a dominant position like being mounted on your opponent or taking his/her back.

Image by Nolan Kent in https://unsplash.com/photos/x_V62hOwnDk?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink

Such a duality is obviously artificial, and usually, most experienced players can play both positions extremely well.

However, it is the case that people tend to lean towards preferences at the beginning of their journey in Jiu Jitsu, and that can hugely impact their progress in other areas if they get stuck executing the same strategy over and over.

In a way, that’s what happened to me, I used to fight a lot as a guard player, due to a predominant culture in Brazil that fosters starting the grappling bouts from the knees to avoid either injuries or because the mat space is not big enough like wrestling mats in big High School Gyms in the US.

Image by the author. Photos of me in competition pulling guard.

This habit of sitting down and fighting from the back willingly without engaging my opponents in the standup game, had a negative impact on my development as a martial artist because as I became better and better at Jiu Jitsu, I realized that one thing holding me back was my lack of high-level knowledge on how to take people down.

This ignited a fire in me to start working more from a standing position, and I went on to study and practice Wrestling and Judo after a couple of years into my brown belt.

Over the last 2 years I have been mostly a top player, and indeed improved quite a bit my ability to take people to the ground.

Image by the author. My wrestling journey.

However, there are certain foundational movements in Judo for example that are extremely difficult to develop, and because I don’t know any Judo experts, nor do I live close to any high-level Judo or Wrestling gyms, I realized that I needed another way to improve certain foundational movements, specifically the hips mobility for takedowns like the “Uchimata” and other hip based throws.

Photo by Kampus Production from Pexels: https://www.pexels.com/photo/a-judoka-throwing-an-opponent-to-the-ground-6765024/

Ok, so with the goal of improving my ability to perform Judo throws like the Uchimata, I concocted a “geeky” plan: I’m gonna use Machine Learning (I know, such a specific plan).

I decided I wanted to investigate whether or not I could use Pose Tracking to gather insight on how to correct things like the speed and direction of the feet and other aspects of executing these movements.

So let’s get into how I did that.

The overall plan was this:

1. Find a video reference containing the movement I was looking to emulate

2. Record myself performing the movement many times

3. Generate insights using pose tracking and visualization with Python.

To do all of that I needed a reference video of an elite-level practitioner performing the move I was trying to learn. In the case of the uchimata, I found this video of an Olympic-level player performing a warm-up technique against the wall that is directly relevant to what I wanted to learn:

Then I started recording myself performing the movement, at least the ones where I am actively learning a certain move.

Image by the author.

In possession of the reference video, and now having recorded some of my own footage, I was ready to try out some fun machine learning stuff.

For the pose tracking I used something called mediapipe, Google’s open-source project for facilitating the application of machine learning to live and streaming media.

The ease of use of this option got me excited to try it out.

In essence, I did the following:

1. First I created some videos with the pose estimation overlayed

2. Created real-time plots of the x,y, and z coordinates of the feet to illustrate the main aspects of the movement

3. Created traces that represented the execution of a certain move at a given time

4. Compared the traces produced by my attempts to a reference trace produced by the expert’s video

1. Pose estimation overlayed

I wrote this code to create videos where the model estimates the position of the body joints

and overlays them in the actual footage to showcase the robustness of the model.

Image by author

Yes, yes I know, I don’t look exactly elite-level. But give me a break, my Judo skills are under construction!

The code I used for this was:

from base64 import b64encode
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import numpy as np
from natsort import natsorted
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.animation import FuncAnimation
from IPython.display import clear_output
%matplotlib inline
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from IPython.display import HTML, display
import ipywidgets as widgets
from typing import List # I don't think I need this!# Custom imports
from pose_tracking_utils import *
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_pose = mp.solutions.pose
def create_pose_tracking_video(video_path):
# For webcam input:
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
output_path = pathlib.Path(video_path).stem + "_pose.mp4" 
out = cv2.VideoWriter(output_path, fourcc, 30.0, (frame_width, frame_height))
with mp_pose.Pose(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as pose:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
break
# To improve performance, optinally mark the iamge as 
# not writeable to pass by reference.
image.flags.writeable = False
image= cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pose.process(image)
# Draw the annotation on the image.
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
mp_drawing.draw_landmarks(image, results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style())
# Flip the image horizontally for a self-view display.
out.write(cv2.flip(image, 1))
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
out.release()
print("Pose video created!")
return output_path

This essentially leverages the mediapipe package to generate a visualization that detects the keypoints and overlays them on top of the video footage.

2. Realtime plots of the X, Y, and Z coordinates of the feet

VIDEO_PATH = "./videos/clip_training_session_1.mp4"
# Initialize MediaPipe Pose model
body_part_index = 32
pose = mp_pose.Pose(static_image_mode=False, min_detection_confidence=0.5, min_tracking_confidence=0.5)# Initialize OpenCV VideoCapture object to capture video from the camera
cap = cv2.VideoCapture(VIDEO_PATH)
# Create an empty list to store the trace of the right elbow
trace = []
# Create empty lists to store the x, y, z coordinates of the right elbow
x_vals = []
y_vals = []
z_vals = []
# Create a Matplotlib figure and subplot for the real-time updating plot
# fig, ax = plt.subplots()
# plt.title('Time Lapse of the X Coordinate')
# plt.xlabel('Frames')
# plt.ylabel('Coordinate Value')
# plt.xlim(0,1)
# plt.ylim(0,1)
# plt.ion()
# plt.show()
frame_num = 0
while True:
# Read a frame from the video capture
success, image = cap.read()
if not success:
break
# Convert the frame to RGB format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Process the frame with MediaPipe Pose model
results = pose.process(image)
# Check if any body parts are detected
if results.pose_landmarks:
# Get the x,y,z coordinates of the right elbow
x, y, z = results.pose_landmarks.landmark[body_part_index].x, results.pose_landmarks.landmark[body_part_index].y, results.pose_landmarks.landmark[body_part_index].z
# Append the x, y, z values to the corresponding lists
x_vals.append(x)
y_vals.append(y)
z_vals.append(z)
# # Add the (x, y) coordinates to the trace list
trace.append((int(x * image.shape[1]), int(y * image.shape[0])))
# Draw the trace on the image
for i in range(len(trace)-1):
cv2.line(image, trace[i], trace[i+1], (255, 0, 0), thickness=2)
plt.title('Time Lapse of the Y Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(0,1)
plt.plot(y_vals);
# Clear the plot and update with the new x, y, z coordinate values
#ax.clear()
# ax.plot(range(0, frame_num + 1), x_vals, 'r.', label='x')
# ax.plot(range(0, frame_num + 1), y_vals, 'g.', label='y')
# ax.plot(range(0, frame_num + 1), z_vals, 'b.', label='z')
# ax.legend(loc='upper left')
# plt.draw()
plt.pause(0.00000000001)
clear_output(wait=True)
frame_num += 1
# Convert the image back to BGR format for display
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Display the image
cv2.imshow('Pose Tracking', image)
# Wait for user input to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the video capture, close all windows, and clear the plot
cap.release()
cv2.destroyAllWindows()
plt.close()

And then, I generated a plot containing the timeline of the x,y,z coordinates:

plt.figure(figsize=(15,7))
plt.subplot(3,1,1)
plt.title('Time Lapse of the x Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(0,1)
plt.plot(x_vals)plt.subplot(3,1,2)
plt.title('Time Lapse of the y Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(0,1.1)
plt.plot(y_vals)
plt.subplot(3,1,3)
plt.title('Time Lapse of the z Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(-1,1)
plt.plot(z_vals)
plt.tight_layout();

The idea with this would be to have granular control over things like, the direction of your feet when executing a movement.

Now that I was confident that the model was properly capturing my body pose, I created some trace visualizations of relevant body joints like the feet (which is really important when performing takedown techniques).

3. Creating traces of motion

To have an idea of how the move is performed I produced a visualization that represented the execution of that movement from the perspective of a body part, in this case, the feet:

Image by author

I did it both for my training sessions and for the reference video containing the motion I was trying to imitate.

It’s important to note here that there are many issues with doing this that regard the resolution of the camera, the distance at which the movements were performed as well as the frame rate of the recording of each video, however, I am just going to bypass all of that to create a fancy plot (LoL).

Again the code for this approach:

def create_joint_trace_video(video_path,body_part_index=32, color_rgb=(255,0,0)):
"""
This function creates a trace of the body part being tracked.
body_part_index: The index of the body part being tracked.
video_path: The path to the video being analysed.
"""
# Initialize MediaPipe Pose modelpose = mp_pose.Pose(static_image_mode=False, min_detection_confidence=0.5, min_tracking_confidence=0.5)# Initialize OpenCV VideoCapture object to capture video from the camera
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
output_path = pathlib.Path(video_path).stem + "_trace.mp4" 
out = cv2.VideoWriter(output_path, fourcc, 30.0, (frame_width, frame_height))
# Create an empty list to store the trace of the body part being tracked
trace = []
with mp_pose.Pose(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as pose:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
break
# Convert the frame to RGB format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Process the frame with MediaPipe Pose model
results = pose.process(image)
# Check if any body parts are detected
if results.pose_landmarks:
# Get the x,y coordinates of the body part being tracked (in this case, the right elbow)
x, y = int(results.pose_landmarks.landmark[body_part_index].x * image.shape[1]), int(results.pose_landmarks.landmark[body_part_index].y * image.shape[0])
# Add the coordinates to the trace list
trace.append((x, y))
# Draw the trace on the image
for i in range(len(trace)-1):
cv2.line(image, trace[i], trace[i+1], color_rgb, thickness=2)
# Convert the image back to BGR format for display
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Display the image
out.write(image)
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
out.release()
print("Joint Trace video created!")

Here I am simply processing each frame as I did before to produce the pose videos, however, I am also appending the x, and y coordinates of the particular body part into a list I call `trace` which is used to produce the tracing line that accompanies the body part throughout the video.

4. Comparing the Traces

In possession of these capabilities, I could finally get into the part of gathering insights from this approach.

To do that, I needed a way to compare these traces in order to produce some type of visually rich feedback that could help me understand how my poor execution of the movement compared to that of an elite athlete.

Now, the actual traces without the video in the background were plotted into a graph.

def get_joint_trace_data(video_path, body_part_index,xmin=300,xmax=1000,
ymin=200,ymax=800):
"""
Creates a graph with the tracing of a particular body part,
while executing a certain movement.
"""
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))# Create an empty list to store the trace of the body part being tracked
trace = []
i = 0
with mp_pose.Pose(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as pose:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
break
# Convert the frame to RGB format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Process the frame with MediaPipe Pose model
results = pose.process(image)
# Check if any body parts are detected
if results.pose_landmarks:
# Get the x,y coordinates of the body part being tracked (in this case, the right elbow)
x, y = int(results.pose_landmarks.landmark[body_part_index].x * image.shape[1]), int(results.pose_landmarks.landmark[body_part_index].y * image.shape[0])
# Add the coordinates to the trace list
trace.append((x, y))
# Plot the trace on the graph
fig, ax = plt.subplots()
#ax.imshow(image)
ax.set_xlim(xmin,xmax)
ax.set_ylim(ymin,ymax)
ax.invert_yaxis()
ax.plot(np.array(trace)[:, 0], np.array(trace)[:, 1], color='r')
# plt.savefig(f'joint_trace{i}.png')
# plt.close()
i+=1
plt.pause(0.00000000001)
clear_output(wait=True)
# Display the graph
#plt.show()
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
return trace
video_path = "./videos/clip_training_session_2.mp4"
body_part_index = 31
foot_trace = get_joint_trace_data(video_path, body_part_index)
video_path = "./videos/uchimata_wall.mp4"
body_part_index = 31
foot_trace_reference = get_joint_trace_data(video_path, body_part_index,xmin=0,ymin=0,xmax=1300)
foot_trace_clip = foot_trace[:len(foot_trace_reference)]
plt.subplot(1,2,1)
plt.plot(np.array(foot_trace_clip)[:, 0], np.array(foot_trace_clip)[:, 1], color='r')
plt.gca().invert_yaxis();
plt.subplot(1,2,2)
plt.plot(np.array(foot_trace_reference)[:, 0], np.array(foot_trace_reference)[:, 1], color='g')
plt.gca().invert_yaxis();

Ok, with this we start to see more clearly the differences between the signature shape of the foot moving in different contexts.

First, we see that while the elite player does more of a straight step into a turn, generating an almost complete half circle with his feet, I, on the other hand, have this curvature appearance to my initial step inside, and also do not create a half circle when throwing my leg into the air.

Also, while the elite player generates a wide circle when moving his leg up, I create a shallow circle almost like an eclipse.

Image by author, comparing traces for movement execution

I found these preliminary results to be quite nice because they indicate that, despite the limitations of the comparison, one can gauge differences regarding the signature shape of the movement’s execution just by observing traces like these.

Besides that, I wanted to see if I could make comparisons regarding the speed with which the moves are performed, to analyze that I visualized the real-time motion of the coordinates of the body joints in time, putting the plots of me and the expert side by side to see how off was my timing.

The challenge with this analysis is that, since the videos have varying speeds and are not aligned in any way, I needed first to align them in a meaningful way.

I was not sure which technique to use here, but a conversation with my buddy Aaron (a neuroscientist at the Champalimaud Neuroscience Institute in Lisbon) kind of illuminated an option for me: dynamic time warping.

Comparing Speed and Timing using Dynamic Time Warping

Dynamic time warping (DTW) is a technique used to measure the similarity between two temporal sequences with different speeds.

The essential idea is that you have two different time series that may have some pattern you wish to analyze, so you attempt to align them by applying a few rules that allow you to calculate the optimal match between the two sequences.

Two repetitions of a walking sequence and although they have varying speeds we can observe that the tracings of the limbs are quite similar; taken from Wikipedia referencing (Olsen et al, 2017).

I found a nice introduction to this topic in this article:

by Jeremy Zhang.

To use dynamic time warping I did the following:

1. Normalized the values to have them in the same range

2. Used a Python implementation of a DTW algorithm.

from fastdtw import fastdtw
from scipy.spatial.distance import euclideanmax_x = max(max(foot_trace_clip, key=lambda x: x[0])[0], max(foot_trace_reference, key=lambda x: x[0])[0])
max_y = max(max(foot_trace_clip, key=lambda x: x[1])[1], max(foot_trace_reference, key=lambda x: x[1])[1])
foot_trace_clip_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_clip]
foot_trace_reference_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_reference]
distance, path = fastdtw(foot_trace_clip_norm, foot_trace_reference_norm, dist=euclidean)

The outputs I get here are:

1. distance: the euclidean distance between the two temporal sequence vectors.

2. path: a mapping between the indexes of the two temporal sequences as a nested list of tuples

Now, I can use the output stored in the path variable to create a plot with both sequences aligned:

foot_trace_reference_norm_mapped = [foot_trace_reference_norm[path[i][1]] for i in range(len(path))]
foot_trace_clip_norm_mapped = [foot_trace_clip_norm[path[i][1]] for i in range(len(path))]plt.subplot(1,2,1)
plt.plot(np.array(foot_trace_reference_norm_mapped)[:, 0], np.array(foot_trace_reference_norm_mapped)[:, 1], color='g')
plt.gca().invert_yaxis();
plt.subplot(1,2,2)
plt.plot(np.array(foot_trace_clip_norm_mapped)[:, 0], np.array(foot_trace_clip_norm_mapped)[:, 1], color='r')
plt.gca().invert_yaxis();
plt.show()

Image by the author, the temporal sequences aligned using the DTW algorithm

Now, given the lack of data mainly for the reference trace, I can’t say that this plot gave me a lot more insight than the elements already discussed before, however, it does help to highlight what I said before regarding the shape of the movement.

However, as a note for the future, my idea here was that, if certain conditions could be met to help with making both videos more uniform, I would like to have a reference tracing from which I compare the tracings of my attempts in order to use it for immediate feedback.

I would use the euclidean distance output from the DTW algorithm as my feedback metric and have an app that could highlight when I am getting closer or farther than the signature shape I would be trying to emulate.

To illustrate that, let me show you an example.

def find_individual_traces(trace,window_size=60, color_plot="r"):
"""
Function that takes in a liste of tuples containing x,y coordinates
and plots them as different clips with varying sizes to allow the user to find
the point where a full repetition has been completed
"""clip_size = 0
for i in range(len(trace)//window_size):
plt.plot(np.array(trace[clip_size:clip_size+window_size])[:, 0], np.array(trace[clip_size:clip_size+window_size])[:, 1], color=color_plot)
plt.gca().invert_yaxis()
plt.title(f"Trace, clip size = {clip_size}")
plt.show()
clip_size+=window_size
def get_individual_traces(trace, clip_size):
num_clips = len(trace)//clip_size
trace_clips = []
i = 0
for clip in range(num_clips):
trace_clips.append(trace[i:i+clip_size])
i+=clip_size
return trace_clips
find_individual_traces(foot_trace_clip_norm)

Images by the author. Traces of the movement of the feet executed by me.

Here I am showing clips from the video where I execute each individual movement. Each of these traces can be compared to a reference trace obtained similarly:

find_individual_traces(foot_trace_reference_norm, window_size=45,color_plot="g")

Images by the author. Traces of the movement of the feet executed by the elite player.

When I obtain the reference tracings I get some noise signals as well, but I will use the third one as my reference:

Image by the author

Now I can loop over the tracings representing my actual movement and see how they compare to this reference trace across a couple of training sessions.

video_path = "./videos/clip_training_session_3.mp4"
body_part_index = 31
foot_trace_clip = get_joint_trace_data(video_path, body_part_index)video_path = "./videos/uchimata_wall.mp4"
body_part_index = 31
foot_trace_reference = get_joint_trace_data(video_path, body_part_index,xmin=0,ymin=0,xmax=1300)
# Showing a plot with the tracings from the training session
plt.plot(np.array(foot_trace_clip)[:, 0], np.array(foot_trace_clip)[:, 1], color='r')
plt.gca().invert_yaxis();

Image by the author. Tracings of the x, y coordinates of the feet over a few executions of the movement.

Now I get the normalized values from both tracings.

max_x = max(max(foot_trace_clip, key=lambda x: x[0])[0], max(foot_trace_reference, key=lambda x: x[0])[0])
max_y = max(max(foot_trace_clip, key=lambda x: x[1])[1], max(foot_trace_reference, key=lambda x: x[1])[1])foot_trace_clip_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_clip]
foot_trace_reference_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_reference]

I get the tracings from the training clip as well as the reference traces to help me set a goal.

The clip size is set manually.

traces = get_individual_traces(foot_trace_clip_norm, clip_size=67)
traces_ref = get_individual_traces(foot_trace_reference_norm, clip_size=60)

I show an example of the tracings obtained after having removed a few which I manually classified as noise upon empirical observation.

# Here I show an example trace from the new clip
index = 0
color_plot = "black"
plt.plot(np.array(traces[index])[:, 0], np.array(traces[index])[:, 1], color=color_plot)
plt.gca().invert_yaxis()
plt.title(f"Trace {index}")
plt.show()

Image by the author

Then I loop over the tracings and plot their score in comparison to a reference trace I choose from the ones obtained from the video with the elite player:

trace_ref = traces[2]
trace_scores = []for trace in traces:
distance, path = fastdtw(trace, trace_ref, dist=euclidean)
trace_scores.append(distance)
plt.plot(trace_scores, color="black")
plt.title("Trace Scores with DTW")
plt.xlabel("Trace Index")
plt.ylabel("Euclidean Distance Score")
plt.show()

Image by the author

Now, the first weird thing I noticed here is the up and down of the metric, which can only be explained by the fact that some of the tracings obtained referred to the foot coming down rather than up while executing the movement.

However, the cool thing about this plot is that the scores for the tracing even seemed to improve a bit and at least stay consistent at 20 (which in this case is the measure of the Euclidean distance between the two sequences).

Despite not being to interpret these numbers conclusively at this point, I found quite insightful that an approach like this could be converted into a measurable metric that compares the quality of a movement with respect to another.

In the future, I would like to look into how to better extract the training clips to obtain perfectly aligned segments of each execution of a movement in order to produce more consistent results.

Overall, I think doing these experiments was quite interesting because it pointed to the power of this technique to give a granular assessment of movement, despite the fact that it would still need a lot of work in order to become a useful tool for insight.

Photo by Kampus Production from Pexels: https://www.pexels.com/photo/a-judoka-throwing-an-opponent-to-the-ground-6765024/

Using pose estimation with mediapipe to track Jiu Jitsu movements

Brazilian Jiu-Jitsu is a martial art that has been getting a lot of popularity recently due to its effectiveness and applicability in real-world combat.

Therefore, I turned to pose estimation as a promising technique for a complimentary tool to aid with my development in Jiu Jitsu.

In this article, I would like to share with you how to use pose tracking to enhance feedback correction when performing fighting movements.

Jiu Jitsu is a martial art centered around the idea of subduing opponents through a combination of pins and submission holds like joint locks and chokes.

Image by Timoth Eberly in https://unsplash.com/photos/7MRajrPiTqw*

Jiu Jitsu is now a popular sport and self-defense system practiced all over the world. It requires physical and mental discipline, as well as a willingness to learn and adapt.

It has also been found to have numerous benefits, including improved physical fitness and mental acuity, increased confidence and self-esteem, as well as stress relief.

Ok, so here is the story.

Image by Nolan Kent in https://unsplash.com/photos/x_V62hOwnDk?utm_source=unsplash&utm_medium=referral&utm_content=creditShareLink

Such a duality is obviously artificial, and usually, most experienced players can play both positions extremely well.

Image by the author. Photos of me in competition pulling guard.

This ignited a fire in me to start working more from a standing position, and I went on to study and practice Wrestling and Judo after a couple of years into my brown belt.

Over the last 2 years I have been mostly a top player, and indeed improved quite a bit my ability to take people to the ground.

Image by the author. My wrestling journey.

Photo by Kampus Production from Pexels: https://www.pexels.com/photo/a-judoka-throwing-an-opponent-to-the-ground-6765024/

Ok, so with the goal of improving my ability to perform Judo throws like the Uchimata, I concocted a “geeky” plan: I’m gonna use Machine Learning (I know, such a specific plan).

So let’s get into how I did that.

The overall plan was this:

1. Find a video reference containing the movement I was looking to emulate

2. Record myself performing the movement many times

3. Generate insights using pose tracking and visualization with Python.

Then I started recording myself performing the movement, at least the ones where I am actively learning a certain move.

Image by the author.

In possession of the reference video, and now having recorded some of my own footage, I was ready to try out some fun machine learning stuff.

For the pose tracking I used something called mediapipe, Google’s open-source project for facilitating the application of machine learning to live and streaming media.

The ease of use of this option got me excited to try it out.

In essence, I did the following:

1. First I created some videos with the pose estimation overlayed

2. Created real-time plots of the x,y, and z coordinates of the feet to illustrate the main aspects of the movement

3. Created traces that represented the execution of a certain move at a given time

4. Compared the traces produced by my attempts to a reference trace produced by the expert’s video

1. Pose estimation overlayed

I wrote this code to create videos where the model estimates the position of the body joints

and overlays them in the actual footage to showcase the robustness of the model.

Image by author

Yes, yes I know, I don’t look exactly elite-level. But give me a break, my Judo skills are under construction!

The code I used for this was:

from base64 import b64encode
import cv2
import mediapipe as mp
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
import numpy as np
from natsort import natsorted
from mpl_toolkits.mplot3d import Axes3D
from matplotlib.animation import FuncAnimation
from IPython.display import clear_output
%matplotlib inline
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px
from IPython.display import HTML, display
import ipywidgets as widgets
from typing import List # I don't think I need this!# Custom imports
from pose_tracking_utils import *
mp_drawing = mp.solutions.drawing_utils
mp_drawing_styles = mp.solutions.drawing_styles
mp_pose = mp.solutions.pose
def create_pose_tracking_video(video_path):
# For webcam input:
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
output_path = pathlib.Path(video_path).stem + "_pose.mp4" 
out = cv2.VideoWriter(output_path, fourcc, 30.0, (frame_width, frame_height))
with mp_pose.Pose(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as pose:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
break
# To improve performance, optinally mark the iamge as 
# not writeable to pass by reference.
image.flags.writeable = False
image= cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
results = pose.process(image)
# Draw the annotation on the image.
image.flags.writeable = True
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
mp_drawing.draw_landmarks(image, results.pose_landmarks,
mp_pose.POSE_CONNECTIONS,
landmark_drawing_spec=mp_drawing_styles.get_default_pose_landmarks_style())
# Flip the image horizontally for a self-view display.
out.write(cv2.flip(image, 1))
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
out.release()
print("Pose video created!")
return output_path

This essentially leverages the mediapipe package to generate a visualization that detects the keypoints and overlays them on top of the video footage.

2. Realtime plots of the X, Y, and Z coordinates of the feet

VIDEO_PATH = "./videos/clip_training_session_1.mp4"
# Initialize MediaPipe Pose model
body_part_index = 32
pose = mp_pose.Pose(static_image_mode=False, min_detection_confidence=0.5, min_tracking_confidence=0.5)# Initialize OpenCV VideoCapture object to capture video from the camera
cap = cv2.VideoCapture(VIDEO_PATH)
# Create an empty list to store the trace of the right elbow
trace = []
# Create empty lists to store the x, y, z coordinates of the right elbow
x_vals = []
y_vals = []
z_vals = []
# Create a Matplotlib figure and subplot for the real-time updating plot
# fig, ax = plt.subplots()
# plt.title('Time Lapse of the X Coordinate')
# plt.xlabel('Frames')
# plt.ylabel('Coordinate Value')
# plt.xlim(0,1)
# plt.ylim(0,1)
# plt.ion()
# plt.show()
frame_num = 0
while True:
# Read a frame from the video capture
success, image = cap.read()
if not success:
break
# Convert the frame to RGB format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Process the frame with MediaPipe Pose model
results = pose.process(image)
# Check if any body parts are detected
if results.pose_landmarks:
# Get the x,y,z coordinates of the right elbow
x, y, z = results.pose_landmarks.landmark[body_part_index].x, results.pose_landmarks.landmark[body_part_index].y, results.pose_landmarks.landmark[body_part_index].z
# Append the x, y, z values to the corresponding lists
x_vals.append(x)
y_vals.append(y)
z_vals.append(z)
# # Add the (x, y) coordinates to the trace list
trace.append((int(x * image.shape[1]), int(y * image.shape[0])))
# Draw the trace on the image
for i in range(len(trace)-1):
cv2.line(image, trace[i], trace[i+1], (255, 0, 0), thickness=2)
plt.title('Time Lapse of the Y Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(0,1)
plt.plot(y_vals);
# Clear the plot and update with the new x, y, z coordinate values
#ax.clear()
# ax.plot(range(0, frame_num + 1), x_vals, 'r.', label='x')
# ax.plot(range(0, frame_num + 1), y_vals, 'g.', label='y')
# ax.plot(range(0, frame_num + 1), z_vals, 'b.', label='z')
# ax.legend(loc='upper left')
# plt.draw()
plt.pause(0.00000000001)
clear_output(wait=True)
frame_num += 1
# Convert the image back to BGR format for display
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Display the image
cv2.imshow('Pose Tracking', image)
# Wait for user input to exit
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# Release the video capture, close all windows, and clear the plot
cap.release()
cv2.destroyAllWindows()
plt.close()

And then, I generated a plot containing the timeline of the x,y,z coordinates:

plt.figure(figsize=(15,7))
plt.subplot(3,1,1)
plt.title('Time Lapse of the x Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(0,1)
plt.plot(x_vals)plt.subplot(3,1,2)
plt.title('Time Lapse of the y Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(0,1.1)
plt.plot(y_vals)
plt.subplot(3,1,3)
plt.title('Time Lapse of the z Coordinate')
plt.xlabel('Frames')
plt.ylabel('Coordinate Value')
plt.xlim(0,len(pose_coords))
plt.ylim(-1,1)
plt.plot(z_vals)
plt.tight_layout();

The idea with this would be to have granular control over things like, the direction of your feet when executing a movement.

3. Creating traces of motion

To have an idea of how the move is performed I produced a visualization that represented the execution of that movement from the perspective of a body part, in this case, the feet:

Image by author

I did it both for my training sessions and for the reference video containing the motion I was trying to imitate.

Again the code for this approach:

def create_joint_trace_video(video_path,body_part_index=32, color_rgb=(255,0,0)):
"""
This function creates a trace of the body part being tracked.
body_part_index: The index of the body part being tracked.
video_path: The path to the video being analysed.
"""
# Initialize MediaPipe Pose modelpose = mp_pose.Pose(static_image_mode=False, min_detection_confidence=0.5, min_tracking_confidence=0.5)# Initialize OpenCV VideoCapture object to capture video from the camera
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
output_path = pathlib.Path(video_path).stem + "_trace.mp4" 
out = cv2.VideoWriter(output_path, fourcc, 30.0, (frame_width, frame_height))
# Create an empty list to store the trace of the body part being tracked
trace = []
with mp_pose.Pose(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as pose:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
break
# Convert the frame to RGB format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Process the frame with MediaPipe Pose model
results = pose.process(image)
# Check if any body parts are detected
if results.pose_landmarks:
# Get the x,y coordinates of the body part being tracked (in this case, the right elbow)
x, y = int(results.pose_landmarks.landmark[body_part_index].x * image.shape[1]), int(results.pose_landmarks.landmark[body_part_index].y * image.shape[0])
# Add the coordinates to the trace list
trace.append((x, y))
# Draw the trace on the image
for i in range(len(trace)-1):
cv2.line(image, trace[i], trace[i+1], color_rgb, thickness=2)
# Convert the image back to BGR format for display
image = cv2.cvtColor(image, cv2.COLOR_RGB2BGR)
# Display the image
out.write(image)
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
out.release()
print("Joint Trace video created!")

4. Comparing the Traces

In possession of these capabilities, I could finally get into the part of gathering insights from this approach.

Now, the actual traces without the video in the background were plotted into a graph.

def get_joint_trace_data(video_path, body_part_index,xmin=300,xmax=1000,
ymin=200,ymax=800):
"""
Creates a graph with the tracing of a particular body part,
while executing a certain movement.
"""
cap = cv2.VideoCapture(video_path)
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))# Create an empty list to store the trace of the body part being tracked
trace = []
i = 0
with mp_pose.Pose(min_detection_confidence=0.5,
min_tracking_confidence=0.5) as pose:
while cap.isOpened():
success, image = cap.read()
if not success:
print("Ignoring empty camera frame.")
break
# Convert the frame to RGB format
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Process the frame with MediaPipe Pose model
results = pose.process(image)
# Check if any body parts are detected
if results.pose_landmarks:
# Get the x,y coordinates of the body part being tracked (in this case, the right elbow)
x, y = int(results.pose_landmarks.landmark[body_part_index].x * image.shape[1]), int(results.pose_landmarks.landmark[body_part_index].y * image.shape[0])
# Add the coordinates to the trace list
trace.append((x, y))
# Plot the trace on the graph
fig, ax = plt.subplots()
#ax.imshow(image)
ax.set_xlim(xmin,xmax)
ax.set_ylim(ymin,ymax)
ax.invert_yaxis()
ax.plot(np.array(trace)[:, 0], np.array(trace)[:, 1], color='r')
# plt.savefig(f'joint_trace{i}.png')
# plt.close()
i+=1
plt.pause(0.00000000001)
clear_output(wait=True)
# Display the graph
#plt.show()
if cv2.waitKey(5) & 0xFF == 27:
break
cap.release()
return trace
video_path = "./videos/clip_training_session_2.mp4"
body_part_index = 31
foot_trace = get_joint_trace_data(video_path, body_part_index)
video_path = "./videos/uchimata_wall.mp4"
body_part_index = 31
foot_trace_reference = get_joint_trace_data(video_path, body_part_index,xmin=0,ymin=0,xmax=1300)
foot_trace_clip = foot_trace[:len(foot_trace_reference)]
plt.subplot(1,2,1)
plt.plot(np.array(foot_trace_clip)[:, 0], np.array(foot_trace_clip)[:, 1], color='r')
plt.gca().invert_yaxis();
plt.subplot(1,2,2)
plt.plot(np.array(foot_trace_reference)[:, 0], np.array(foot_trace_reference)[:, 1], color='g')
plt.gca().invert_yaxis();

Ok, with this we start to see more clearly the differences between the signature shape of the foot moving in different contexts.

Also, while the elite player generates a wide circle when moving his leg up, I create a shallow circle almost like an eclipse.

Image by author, comparing traces for movement execution

The challenge with this analysis is that, since the videos have varying speeds and are not aligned in any way, I needed first to align them in a meaningful way.

Comparing Speed and Timing using Dynamic Time Warping

Dynamic time warping (DTW) is a technique used to measure the similarity between two temporal sequences with different speeds.

Two repetitions of a walking sequence and although they have varying speeds we can observe that the tracings of the limbs are quite similar; taken from Wikipedia referencing (Olsen et al, 2017).

I found a nice introduction to this topic in this article:

by Jeremy Zhang.

To use dynamic time warping I did the following:

1. Normalized the values to have them in the same range

2. Used a Python implementation of a DTW algorithm.

from fastdtw import fastdtw
from scipy.spatial.distance import euclideanmax_x = max(max(foot_trace_clip, key=lambda x: x[0])[0], max(foot_trace_reference, key=lambda x: x[0])[0])
max_y = max(max(foot_trace_clip, key=lambda x: x[1])[1], max(foot_trace_reference, key=lambda x: x[1])[1])
foot_trace_clip_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_clip]
foot_trace_reference_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_reference]
distance, path = fastdtw(foot_trace_clip_norm, foot_trace_reference_norm, dist=euclidean)

The outputs I get here are:

1. distance: the euclidean distance between the two temporal sequence vectors.

2. path: a mapping between the indexes of the two temporal sequences as a nested list of tuples

Now, I can use the output stored in the path variable to create a plot with both sequences aligned:

foot_trace_reference_norm_mapped = [foot_trace_reference_norm[path[i][1]] for i in range(len(path))]
foot_trace_clip_norm_mapped = [foot_trace_clip_norm[path[i][1]] for i in range(len(path))]plt.subplot(1,2,1)
plt.plot(np.array(foot_trace_reference_norm_mapped)[:, 0], np.array(foot_trace_reference_norm_mapped)[:, 1], color='g')
plt.gca().invert_yaxis();
plt.subplot(1,2,2)
plt.plot(np.array(foot_trace_clip_norm_mapped)[:, 0], np.array(foot_trace_clip_norm_mapped)[:, 1], color='r')
plt.gca().invert_yaxis();
plt.show()

Image by the author, the temporal sequences aligned using the DTW algorithm

To illustrate that, let me show you an example.

def find_individual_traces(trace,window_size=60, color_plot="r"):
"""
Function that takes in a liste of tuples containing x,y coordinates
and plots them as different clips with varying sizes to allow the user to find
the point where a full repetition has been completed
"""clip_size = 0
for i in range(len(trace)//window_size):
plt.plot(np.array(trace[clip_size:clip_size+window_size])[:, 0], np.array(trace[clip_size:clip_size+window_size])[:, 1], color=color_plot)
plt.gca().invert_yaxis()
plt.title(f"Trace, clip size = {clip_size}")
plt.show()
clip_size+=window_size
def get_individual_traces(trace, clip_size):
num_clips = len(trace)//clip_size
trace_clips = []
i = 0
for clip in range(num_clips):
trace_clips.append(trace[i:i+clip_size])
i+=clip_size
return trace_clips
find_individual_traces(foot_trace_clip_norm)

Images by the author. Traces of the movement of the feet executed by me.

Here I am showing clips from the video where I execute each individual movement. Each of these traces can be compared to a reference trace obtained similarly:

find_individual_traces(foot_trace_reference_norm, window_size=45,color_plot="g")

Images by the author. Traces of the movement of the feet executed by the elite player.

When I obtain the reference tracings I get some noise signals as well, but I will use the third one as my reference:

Image by the author

Now I can loop over the tracings representing my actual movement and see how they compare to this reference trace across a couple of training sessions.

video_path = "./videos/clip_training_session_3.mp4"
body_part_index = 31
foot_trace_clip = get_joint_trace_data(video_path, body_part_index)video_path = "./videos/uchimata_wall.mp4"
body_part_index = 31
foot_trace_reference = get_joint_trace_data(video_path, body_part_index,xmin=0,ymin=0,xmax=1300)
# Showing a plot with the tracings from the training session
plt.plot(np.array(foot_trace_clip)[:, 0], np.array(foot_trace_clip)[:, 1], color='r')
plt.gca().invert_yaxis();

Image by the author. Tracings of the x, y coordinates of the feet over a few executions of the movement.

Now I get the normalized values from both tracings.

max_x = max(max(foot_trace_clip, key=lambda x: x[0])[0], max(foot_trace_reference, key=lambda x: x[0])[0])
max_y = max(max(foot_trace_clip, key=lambda x: x[1])[1], max(foot_trace_reference, key=lambda x: x[1])[1])foot_trace_clip_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_clip]
foot_trace_reference_norm = [(x/max_x, y/max_y) for (x, y) in foot_trace_reference]

I get the tracings from the training clip as well as the reference traces to help me set a goal.

The clip size is set manually.

traces = get_individual_traces(foot_trace_clip_norm, clip_size=67)
traces_ref = get_individual_traces(foot_trace_reference_norm, clip_size=60)

I show an example of the tracings obtained after having removed a few which I manually classified as noise upon empirical observation.

# Here I show an example trace from the new clip
index = 0
color_plot = "black"
plt.plot(np.array(traces[index])[:, 0], np.array(traces[index])[:, 1], color=color_plot)
plt.gca().invert_yaxis()
plt.title(f"Trace {index}")
plt.show()

Image by the author

Then I loop over the tracings and plot their score in comparison to a reference trace I choose from the ones obtained from the video with the elite player:

trace_ref = traces[2]
trace_scores = []for trace in traces:
distance, path = fastdtw(trace, trace_ref, dist=euclidean)
trace_scores.append(distance)
plt.plot(trace_scores, color="black")
plt.title("Trace Scores with DTW")
plt.xlabel("Trace Index")
plt.ylabel("Euclidean Distance Score")
plt.show()

Image by the author

In the future, I would like to look into how to better extract the training clips to obtain perfectly aligned segments of each execution of a movement in order to produce more consistent results.

Read original article here

Denial of responsibility! Techno Blender is an automatic aggregator of the all world’s media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials, please contact us by email – admin@technoblender.com. The content will be deleted within 24 hours.

Machine Learning for Jiu Jitsu. Using pose estimation with mediapipe to… | by Lucas Soares | Mar, 2023

Using pose estimation with mediapipe to track Jiu Jitsu movements

1. Pose estimation overlayed

2. Realtime plots of the X, Y, and Z coordinates of the feet

3. Creating traces of motion

4. Comparing the Traces

Using pose estimation with mediapipe to track Jiu Jitsu movements

1. Pose estimation overlayed

2. Realtime plots of the X, Y, and Z coordinates of the feet

3. Creating traces of motion

4. Comparing the Traces

Related Posts