Main Content

Generate a Multithreaded MEX File from a MATLAB Function Using Unfolding

This example shows how to use the dspunfold function to generate a multithreaded MEX file from a MATLAB® function.

Note: The following assumes that the current host computer has at least 2 physical CPU cores. The presented screenshots, speedup, and latency values were collected using a host computer with 8 physical CPU cores.

Required MathWorks™ products:

  • DSP System Toolbox™

  • MATLAB® Coder™

Introduction

dspunfold generates a multithreaded MEX file from a MATLAB function using unfolding technology. This MATLAB function can contain an algorithm which is stateless (has no states) or stateful (has states).

Using dspunfold

Consider the MATLAB function spectralAnalysisExample. The function performs the following algorithm:

1) Compute the one-sided spectrum estimate of the input

2) Compute the total harmonic distortion (THD), signal to noise ratio (SNR), signal to noise and Distortion ratio (SINAD) and the spurious free dynamic range (SFDR) of the spectrum

type spectralAnalysisExample
function [THD,SNR,SINAD,SFDR] = spectralAnalysisExample(x)
%

% Copyright 2015-2016 The MathWorks, Inc.

persistent powerSpectrum
if isempty(powerSpectrum)
  powerSpectrum  = dsp.SpectrumEstimator('FrequencyRange','onesided',...
                              'SampleRate',8000,...
                              'SpectralAverages',1);  
end

% Get one-sided spectrum estimate
Pxx = powerSpectrum(x); 

% Compute measurements
[amp, harmSum, totalNoise, maxSpur] = ...
    getHarmonicDistortion(...
    getFrequencyVector(powerSpectrum), Pxx, getRBW(powerSpectrum), 6);

THD   = 10*log10(harmSum/amp(1));              
SNR   = 10*log10(amp(1)/totalNoise);               
SINAD = 10*log10(amp(1)/(harmSum + totalNoise));
SFDR  = 10*log10(amp(1)/maxSpur);                

To accelerate the algorithm, a common approach is to generate a MEX file using the codegen function. Below is an example of how to do so when using an input of 4096 doubles. The generated MEX file, dspunfoldDCTExample_mex, is single-threaded.

codegen spectralAnalysisExample -args {(1:4096)'}
Code generation successful.

To generate a multithreaded MEX file, use the dspunfold function. The argument -s indicates that the algorithm in spectralAnalysisExample has no states.

dspunfold spectralAnalysisExample -args {(1:4096)'} -s 0
State length: 0 frames, Repetition: 1, Output latency: 8 frames, Threads: 4
Analyzing: spectralAnalysisExample.m
Creating single-threaded MEX file: spectralAnalysisExample_st.mexa64
Creating multi-threaded MEX file: spectralAnalysisExample_mt.mexa64
Creating analyzer file: spectralAnalysisExample_analyzer.p

This will generate the following files:

  • multithreaded MEX file, spectralAnalysisExample_mt

  • single-threaded MEX file, spectralAnalysisExample_st (which is identical to the MEX file obtained using the codegen function)

  • self-diagnostic analyzer function, spectralAnalysisExample_analyzer

To measure the speedup of the multithreaded MEX file relative to the single-threaded MEX file, see the example function dspunfoldBenchmarkSpectrumExample:

type dspunfoldBenchmarkSpectrumExample
function dspunfoldBenchmarkSpectrumExample
% Function used to measure the speedup of the multi-threaded MEX file
% obtained using dspunfold vs the single-threaded MEX file

% Copyright 2015 The MathWorks, Inc.

clear spectralAnalysisExample_st;  % for benchmark precision purpose
clear spectralAnalysisExample_mt;  % for benchmark precision purpose

numFrames = 1e5;
inputFrame = (1:4096)';

% exclude first run from timing measurements
spectralAnalysisExample_st(inputFrame); 
tic;  % measure execution time for the single-threaded MEX
for frame = 1:numFrames 
    spectralAnalysisExample_st(inputFrame);
end
timeSingleThreaded = toc;

% exclude first run from timing measurements
spectralAnalysisExample_mt(inputFrame); 
tic;  % measure execution time for the multi-threaded MEX
for frame = 1:numFrames
    spectralAnalysisExample_mt(inputFrame);
end
timeMultiThreaded = toc;
fprintf('Speedup = %.1fx\n',timeSingleThreaded/timeMultiThreaded);

dspunfoldBenchmarkSpectrumExample measures the execution time taken by spectralAnalysisExample_st and spectralAnalysisExample_mt to process 'numFrames' frames. Finally it prints the speedup, which is the ratio between the multithreaded MEX file execution time and single-threaded MEX file execution time.

dspunfoldBenchmarkSpectrumExample;
Speedup = 0.9x

The speedup could be improved even more by increasing the Repetition value, which will be discussed later.

DSP unfolding generates a multithreaded MEX file which buffers multiple signal frames and then processes these frames simultaneously, using multiple cores. This process introduces some deterministic output latency. Executing 'help spectralAnalysisExample_mt' displays more information about the multithreaded MEX file, one of them being the value of the output latency. For this example, the output of the multithreaded MEX file has a latency of 16 frames relative to its input, which is not the case for the single-threaded MEX file. Below is the plot generated by dspunfoldShowLatencySpectrumExample, which displays the outputs of the single-threaded and multithreaded MEX files. Notice that the output of the multithreaded MEX is delayed by 16 frames, relative to that of the single-threaded MEX.

dspunfoldShowLatencySpectrumExample;

Figure spectralAnalysisExample_mt output latency contains an axes object. The axes object with title Output latency is 8 frames, xlabel Frame index, ylabel THD contains 3 objects of type line. These objects represent Single-threaded MEX, Multi-threaded MEX, Multi-threaded MEX shifted by 8 frames.

Verify Resulting Multithreaded MEX with the Generated Analyzer

When creating a multithreaded MEX file using dspunfold, the single-threaded MEX file is also created along with an analyzer function. For this example, the name of the analyzer is spectralAnalysisExample_analyzer.

The goal of the analyzer is to provide a quick way to measure the speedup of the multithreaded MEX relative to the single-threaded MEX, and also check if the outputs of the multithreaded MEX and single-threaded MEX match. Outputs usually do not match when an incorrect state length value is specified.

The example below executes the analyzer for the multithreaded MEX file, dspunfoldFIRExample_mt.

Fs = 8000;
NumFrames = 10;
t  = (1/Fs) * (0:4096*NumFrames-1); t = t.';
f = 100;
x  = sin(2*pi*f*t) + .01 * randn(size(t));
spectralAnalysisExample_analyzer(x)
Analyzing multi-threaded MEX file spectralAnalysisExample_mt.mexa64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done.
Latency = 8 frames
Speedup = 1.6x
ans = struct with fields:
    Latency: 8
    Speedup: 1.6079
       Pass: 1

Each input to the analyzer corresponds to the inputs of the dspunfoldFIRExample_mt MEX file. Notice that the length (first dimension) of each input is greater than the expected length. For example, dspunfoldFIRExample_mt expects a frame of 4096 doubles for its first input, while 4096*10 samples were provided to spectralAnalysisExample_analyzer. The analyzer interprets this input as 10 frames of 4096 samples. The analyzer alternates between these 10 input frames (in a circular fashion) while checking if the outputs of the multithreaded and single-threaded MEX files match.

Note: For the analyzer to correctly check for the numerical match between the multithreaded MEX and single-threaded MEX, it is recommended that you provide at least 2 frames with different values for each input.

Specifying State and Repetition Values

Let us modify the spectral measurement example by setting the spectral average length of the spectrum estimate to 4 instead of 1. The spectrum estimate is now a running average of the current estimate and the three previous estimate. This algorithm has a state length of 3 frames. The MATLAB function spectralAnalysisWithStatesExample contains the modified algorithm:

type spectralAnalysisWithStatesExample
function [THD,SNR,SINAD,SFDR] = spectralAnalysisWithStatesExample(x)
%

% Copyright 2015-2016 The MathWorks, Inc.

persistent powerSpectrum
if isempty(powerSpectrum)
  powerSpectrum  = dsp.SpectrumEstimator('FrequencyRange','onesided',...
                              'SampleRate',8000,...
                              'SpectralAverages',4);  
end

% Get one-sided spectrum estimate
Pxx = powerSpectrum(x); 

% Compute measurements
[amp, harmSum, totalNoise, maxSpur] = ...
    getHarmonicDistortion(...
    getFrequencyVector(powerSpectrum), Pxx, getRBW(powerSpectrum), 6);

THD   = 10*log10(harmSum/amp(1));              
SNR   = 10*log10(amp(1)/totalNoise);               
SINAD = 10*log10(amp(1)/(harmSum + totalNoise));
SFDR  = 10*log10(amp(1)/maxSpur);                

To build the multithreaded MEX file, we have to provide the state length corresponding to the two FIR filters. Specifying -s 3 when invoking dspunfold indicates that the state length does not exceed 3 frames.

The speedup can be increased even more by increasing the repetition (-r) provided when invoking dspunfold. The default repetition value is 1. Increasing this value makes the multithreaded MEX buffer more frames internally, before it starts processing them, increasing the efficiency of the multithreading, but at the cost of a higher output latency. Also note that the maximum state length allowed is (threads-1)*Repetition*FrameSize frames. If the specified state length exceeds that value, dspunfold falls back a single-threaded MEX. If latency may be tolerated by the application, increasing the value of repetition allows generating a multithreaded MEX with a longer state.

The command below generates a multithreaded MEX function using a repetition value of 5 and a state length of 3 frames:

dspunfold spectralAnalysisWithStatesExample -args {(1:4096)'} -s 3 -r 5
State length: 3 frames, Repetition: 5, Output latency: 40 frames, Threads: 4
Analyzing: spectralAnalysisWithStatesExample.m
Creating single-threaded MEX file: spectralAnalysisWithStatesExample_st.mexa64
Creating multi-threaded MEX file: spectralAnalysisWithStatesExample_mt.mexa64
Creating analyzer file: spectralAnalysisWithStatesExample_analyzer.p

The analyzer may be used to validate the numerical results of the multithreaded MEX and provide speed-up and latency information:

L = 4096;
NumFrames = 10;
sine  = dsp.SineWave('SamplesPerFrame',L * NumFrames,'SampleRate',8000);
x     = sine() + 0.01 * randn(L * NumFrames, 1);
spectralAnalysisWithStatesExample_analyzer(x)
Analyzing multi-threaded MEX file spectralAnalysisWithStatesExample_mt.mexa64. For best results, please refrain from interacting with the computer and stop other processes until the analyzer is done.
Latency = 40 frames
Speedup = 1.2x
ans = struct with fields:
    Latency: 40
    Speedup: 1.1509
       Pass: 1

Simulation Example

The function dspunfoldNoisySineExample demonstrates the usage of the multithreaded MEX to estimate spectral characteristics of a noisy sine wave. The measurements are plotted on a time scope. Performance of the multithreaded MEX is compared to the MATLAB simulation and the single-threaded MEX performance. The gains of the multithreaded MEX are still apparent even with the overhead brought by the plotting and input signal generation of the testbench.

dspunfoldNoisySineExample

MATLAB Sim/Single-threaded MEX speedup: 2.9
MATLAB Sim/Multi-threaded MEX speedup: 2.9

References

[1] DSP unfolding on Wikipedia : Unfolding (DSP implementation)