Deep Learning-based Human Pose Estimation for Squat Analysis

Since R2024b

This example uses:

Computer Vision Toolbox Computer Vision Toolbox
Computer Vision Toolbox Model for Object Keypoint Detection Computer Vision Toolbox Model for Object Keypoint Detection
Computer Vision Toolbox Model for RTMDet Object Detection Computer Vision Toolbox Model for RTMDet Object Detection
Deep Learning Toolbox Deep Learning Toolbox

This example shows how to use human pose estimation for squat analysis from a recorded video. In this example, you will use a pretrained deep learning network to detect a person in the input video. Then, use a pretrained HRNet key point detector to identify keypoints on the detected person. These keypoints can then be used to determine if the person is performing a squat movement. The rest of the example explains the steps involved.

Step 1: Load Pretrained Deep Learning Networks

Load a deep learning object detector trained on COCO dataset to detect people in an image by using the peopleDetector object.

detector = peopleDetector;

Load a pretrained HRNet object keypoint detector. The default network is HRNet-W32, trained on the COCO keypoint detection data set. In an HRNet-W32 network, the last three stages of the high-resolution subnetworks have 32 convolved feature maps. For more information about HRNet architecture and HRNet object keypoint detector, see Getting Started with HRNet and hrnetObjectKeypointDetector, respectively.

keyPtDet = hrnetObjectKeypointDetector;

Step 2: Read Video

Download the squat exercise video.

downloadFolder = pwd;
dataFilename = "SquatExerciseVideo.zip";
dataUrl = "https://ssd.mathworks.com/supportfiles/vision/data/" + dataFilename;
zipFile = fullfile(downloadFolder,dataFilename);
if ~exist(zipFile,"file")
    disp("Downloading Squat Exercise Video (8 MB)...")
    websave(zipFile,dataUrl);
end
unzip(zipFile,downloadFolder)
dataset = fullfile(downloadFolder,"SquatExerciseVideo.mp4");

Create a VideoReader object to read a video into the MATLAB® workspace. The video used in this example is a recorded video of a person performing squat movement.

reader = VideoReader('SquatExerciseVideo.mp4');

Step 3: Perform Detections on Video Frame

Read a desired video frame from the input video by setting the current time of the VideoReader object. The readFrame function then reads the next available video frame from the specified time.

reader.CurrentTime = 16.5;
videoFrame = readFrame(reader);

Detect the person in the video frame by using the detect method of peopleDetector object.

[bboxes,scores,class] = detect(detector,videoFrame);
[val,indx] = max(scores);
bbox = bboxes(indx,:);
detection = insertObjectAnnotation(videoFrame,'rectangle',bbox,class(indx));
figure
imshow(detection)

Detect the object keypoints in the cropped image by using the pretrained HRNet object keypoint detector. Use the detect method of the hrnetObjectKeypointDetector object to compute the keypoints.

[keypoints,keypointScores,valid] = detect(keyPtDet,videoFrame,bbox);

Insert the detected keypoints into the input frame and display the results.

keyLabels = categorical(1:length(keypointScores))';
detectedPtsImage = insertObjectKeypoints(videoFrame,keypoints,KeypointColor="red",...
    KeypointLabel=keyLabels,TextBoxColor="cyan",FontColor="blue");

For better visualization of detected keypoints and their locations, crop and display the detected bounding box region.

detectedKeyPoints = imcrop(detectedPtsImage,bbox);
fig = figure(Position=[0 0 400 800]);
hAxes = axes(fig);
image(detectedKeyPoints,Parent=hAxes)
axis off

Step 4: Identify Keypoints and Criteria for Squat Analysis

From the detection, identify the keypoints that you want to use for detecting squat movement. This example uses the keypoints near the hip joint, knee joint, and shoulder joint on the right side of the person's body to perform squat analysis. Then, connects the keypoints near the hip, knee, and shoulder joints to form two line segments:

The line segment from the hip to the knee.
The line segment from the hip to the shoulder.

Specify the index of the desired keypoints near the hip joint, knee joint, and the shoulder joint.

hipIndex = keypoints(13,:);
shoulderIndex = keypoints(7,:);
kneeIndex = keypoints(15,:);
xIndex = [shoulderIndex(1) hipIndex(1) kneeIndex(1)];
yIndex = [shoulderIndex(2) hipIndex(2) kneeIndex(2)];

Draw line segments connecting hip to the knee and hip to the shoulder.

figure
imshow(detectedPtsImage)
hold on
line(xIndex,yIndex,LineWidth=2,Color="Yellow")

Measure the angles made by the two line segments with respect to the horizontal axis of the image to determine if the person is performing the squat movement. In this example, the movement is counted as a squat, if these two conditions are satisfied:

The angle of the line segment from the hip joint to the shoulder joint must be less than 60 degrees.
The angle of the line segment from the hip joint to the knee joint must be less than 15 degrees.

angle1 = (180/pi).*atan(abs((keypoints(13,2)-keypoints(7,2))./(keypoints(7,1)-keypoints(13,1))));
angle2 = (180/pi).*atan(abs((keypoints(15,2)-keypoints(13,2))./(keypoints(15,1)-keypoints(13,1))));
if lt(angle1,60) && lt(angle2,15)
    disp("Squat")
else
    disp("Not a squat")
end

Not a squat

You can also use a different set of keypoints and consider multiple angles to accurately analyze if the squat movement is performed correctly.

Step 5: Perform Squat Analysis on Video

In this section, you will use the approach explained in Step 2 to Step 4 to identify and count the number of squat movements for the duration of the entire video.

Reset the current time of the video reader to zero to start reading the video from the beginning.

reader.CurrentTime = 0;

Initialize the video player to display the squat analysis results. Specify the player position on the screen.

videoPlayer = vision.VideoPlayer(Position=[100 100 600 800]);

Set the flag to true to enable squat counting.

countSquat = true;
squatCount = 0;

Initialize the update flag to false. This flag ensures that each squat movement is counted only once. Initially set to false, it checks if the angles between specific keypoints meet the squat criteria: angle1 (hip to shoulder) must be less than 60 degrees, and angle2 (knee to hip) must be less than 15 degrees. When these criteria are met and the flag is false, the squat count increments by 1, and the flag is set to true. If the criteria are not met in subsequent frames, the flag resets to false, allowing the next valid squat to be counted.

update = false;

Perform steps 2 to 4 on each frame in the video.

while hasFrame(reader)

    % Step 2: Read Video Frame
    videoFrame = imresize(readFrame(reader),[600 400]);

    % Step 3: Perform Detections
    [bboxes,scores,class] = detect(detector,videoFrame);
    [val,indx] = max(scores);
    boxPerson = bboxes(indx,:);

    if ~isempty(boxPerson)
        [keypoints,scores,vflag] = detect(keyPtDet,videoFrame,boxPerson);
        videoFrame = insertObjectKeypoints(videoFrame,keypoints,Connections=keyPtDet.KeypointConnections,...
            KeypointSize = 4,ConnectionColor="y",LineWidth=2);

        % Step 4: Identify Keypoints and Criteria for Squat Analysis
        angle1 = (180/pi).*atan(abs((keypoints(13,2)-keypoints(7,2))./(keypoints(7,1)-keypoints(13,1))));
        angle2 = (180/pi).*atan(abs((keypoints(15,2)-keypoints(13,2))./(keypoints(15,1)-keypoints(13,1))));

        if countSquat
            if lt(angle1,60) && lt(angle2,15)
                if ~update
                    squatCount = squatCount+1;
                    update = true;
                end
            else
                update = false;
            end
            countText = "SquatCount = " + string(squatCount);
            videoFrame = insertText(videoFrame,[1 50],countText,FontSize=15,TextBoxColor="cyan",FontColor="blue");
        end
        % Display the results
        step(videoPlayer,videoFrame)
    end
end
release(videoPlayer)