Main Content

Accelerate Link-Level Simulations with Parallel Processing

Since R2024a

This example shows how to accelerate link-level simulations by using a cluster of workers from a parallel pool.


Link-level simulations require a large number of frames to provide statistically valid results. Therefore, these simulations can take a long time to run. Parallel computing is a commonly used technique to speed up these simulations. This example shows how to run link-level simulations by using MATLAB® workers from a parallel pool (requires Parallel Computing Toolbox™). Alternatively, to run the example without Parallel Computing Toolbox features, you can disable parallel execution.

Parallel Computing Toolbox enables you to use the full processing power of multi-core desktops by executing applications on workers (MATLAB computational engines) that run locally. Without changing the code, you can run the same applications on clusters or clouds.

For more information on how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.

The figure shows how to parallelize the link-level simulation over a number a workers Nworkers. Each worker runs the same link simulation with different random processes to generate random bits and noise samples. Each worker simulates Nslots_per_worker slots. Therefore, the total number of slots in this simulation is Nslots_per_worker×Nworkers. The example combines the resulting throughput measurements for each worker to produce the overall throughput. Each worker runs all the required SNR points.

This example focuses on how to speed up link-level simulations by using parallel processing, and uses a simplified link-level simulation. The characteristics of this simplified link-level simulation are: single antenna, single layer, AWGN channel, no HARQ.

Simulation Parameters

Set the SNR points and the overall number of frames to simulate.

SNRdB = 5.7:0.1:6.2;        % SNR in dB
numFrames = 7;              % Number of frames to simulate 

Configure carrier, PDSCH, and DL-SCH.

carrier = nrCarrierConfig;
pdsch = nrPDSCHConfig;
pdsch.Modulation = "16QAM";
pdsch.PRBSet = 0:carrier.NSizeGrid-1; % Full band allocation

[encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder();

Configure Parallel Pool

By default, this example enables parallel execution. Alternatively, you can disable parallel execution, for example, when debugging.

enableParallelism = true;

Create a parallel pool and get the number of workers if parallel execution is enabled.

if (enableParallelism)
    pool = gcp; % create parallel pool, requires Parallel Computing Toolbox
    numWorkers = pool.NumWorkers;
    maxNumWorkers = pool.NumWorkers;
    numWorkers = 1;    % No parallelism
    maxNumWorkers = 0; % Used to convert the parfor-loop into a for-loop
Starting parallel pool (parpool) using the 'Processes' profile ...
Connected to parallel pool with 4 workers.

Random Number Generator

To reproduce the same set of random bits and noise samples in a parfor-loop each time the loop runs, you must control random generation by assigning a particular substream for each worker. First, create a constant random stream to avoid unnecessary copying of the random stream multiple times to each worker. Use a generator with substream support. Substreams provide mutually independent random streams to each worker. For more information on random number streams on workers, see Control Random Number Streams on Workers (Parallel Computing Toolbox) and Repeat Random Numbers in parfor-Loops (Parallel Computing Toolbox).

randStr = RandStream('Threefry','Seed',0);
constantStream = parallel.pool.Constant(randStr);

PDSCH Throughput Simulation

Calculate the number of slots per worker by taking into account the number of frames to simulate and the available number of workers. Use the ceil function to ensure that all workers simulate the same number of slots. This operation may result in the total number of frames simulated being slightly larger than the value specified in numFrames.

% Calculate the number of slots per worker
numSlotsPerWorker = ceil((numFrames*carrier.SlotsPerFrame)/numWorkers);
disp("Parallel execution: "+enableParallelism)
Parallel execution: true

Display the number of workers. This value depends on the workers available to you and the settings of your parallel pool. This example sets the number of workers to 1 if enableParallelism = false.

disp("Number of workers: "+numWorkers)
Number of workers: 4
disp("Number of slots per worker: "+numSlotsPerWorker)
Number of slots per worker: 18

The simulation is based on a parallel loop that uses the workers from the parallel pool. By setting maxNumWorkers = 0 you can switch between parallel and serial execution when testing your code. This allows you to debug your code. You cannot set a breakpoint in the body of the parfor-loop, but you can set breakpoints within functions called from the body of the parfor-loop.

% Results storage
numSNRPoints = numel(SNRdB);
numSlotErrorsPerWorker = zeros(numWorkers,numSNRPoints);
simulatedBitsPerWorker = zeros(numWorkers,numSNRPoints);
numCorrectBitsPerWorker = zeros(numWorkers,numSNRPoints);

% Parallel processing, worker parfor-loop
parfor (workerIdx = 1:numWorkers,maxNumWorkers)     
    % Set random streams to ensure repeatability
    % Use substreams in the generator so each worker uses mutually independent streams
    stream = constantStream.Value;        % Extract the stream from the Constant
    stream.Substream = workerIdx;         % Set substream value = parfor index
    RandStream.setGlobalStream(stream);   % Set global stream per worker

    % Per worker processing: PDSCH link
    resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker);

    % Gather results    
    numSlotErrorsPerWorker(workerIdx,:) = resultsPerWorker.NumSlotErrors;
    simulatedBitsPerWorker(workerIdx,:) = resultsPerWorker.NumBits;
    numCorrectBitsPerWorker(workerIdx,:) = resultsPerWorker.NumCorrectBits;
end % parfor

% Combine results from all workers
totalNumTrBlkErrors = sum(numSlotErrorsPerWorker,1);
totalSimulatedTrBlks = numSlotsPerWorker*numWorkers*ones(1,numSNRPoints);
totalSimulatedFrames = totalSimulatedTrBlks/carrier.SlotsPerFrame;
totalsimulatedBits = sum(simulatedBitsPerWorker,1);
totalCorrectBits = sum(numCorrectBitsPerWorker,1);

% Throughput results calculation
throughput = 100*(1-totalNumTrBlkErrors./totalSimulatedTrBlks);
throughputMbps = 1e-6*totalCorrectBits/(numFrames*10e-3);
ResultsTable = table(SNRdB.',totalsimulatedBits.',totalNumTrBlkErrors.',totalSimulatedTrBlks.',totalSimulatedFrames.',throughput.',throughputMbps.');
ResultsTable.Properties.VariableNames = ["SNR" "Simulated bits" "Tr Block errors" "Number of Tr Blocks" "Number of frames" "Throughput (%)" "Throughput (Mbps)"];
    SNR    Simulated bits    Tr Block errors    Number of Tr Blocks    Number of frames    Throughput (%)    Throughput (Mbps)
    ___    ______________    _______________    ___________________    ________________    ______________    _________________

    5.7      1.1249e+06            72                   72                   7.2                    0                  0      
    5.8      1.1249e+06            62                   72                   7.2               13.889              2.232      
    5.9      1.1249e+06            43                   72                   7.2               40.278             6.4728      
      6      1.1249e+06            16                   72                   7.2               77.778             12.499      
    6.1      1.1249e+06             2                   72                   7.2               97.222             15.624      
    6.2      1.1249e+06             0                   72                   7.2                  100              16.07      

Accelerate Simulation

You can reduce the simulation time by increasing the number of workers. You can use all the workers on your local machine or use multiple workers in a cluster. You do not need to set the number fo workers in the example code. To configure the number of workers, use the Cluster Profile Manager in the Parallel menu on the MATLAB® Home tab. For more information on how to discover and set up a cluster of workers, see the Scale Up from Desktop to Cluster (Parallel Computing Toolbox) example.

The table shows the results of running the example three times for 1000 frames with different worker configurations.


1 Worker on Desktop (No Parallelism)

6 Workers on Desktop

96 Workers in Cluster

Simulation Time

3543 sec (~1 hr)

983 sec (~16 min)

108 sec (~1.8 min)

Local Functions

function resultsPerWorker = pdschLink(carrier,pdsch,encodeDLSCH,decodeDLSCH,SNRdB,numSlotsPerWorker)
% Simplified PDSCH link simulation executed by all workers

    resultsPerWorker.NumSlotErrors = zeros(1,numel(SNRdB));
    resultsPerWorker.NumBits = zeros(1,numel(SNRdB));
    resultsPerWorker.NumCorrectBits = zeros(numel(SNRdB),1);

    ofdmInfo = nrOFDMInfo(carrier);

    % for all SNR points
    for snrIdx = 1:length(SNRdB)

        % Noise power calculation
        SNR = 10^(SNRdB(snrIdx)/10); % Linear noise gain
        % No need to normalize N0 by the number of receive antennas as
        % there is only one
        N0 = 1/sqrt(double(ofdmInfo.Nfft)*SNR);

        % Process all the slots per worker
        for nSlot = 0:numSlotsPerWorker-1

            % New slot number
            carrier.NSlot = nSlot;

            % Transmit and receive slot (AWGN channel)
            [blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0);

            % Store results
            resultsPerWorker.NumSlotErrors(snrIdx) = resultsPerWorker.NumSlotErrors(snrIdx)+blkerr;
            resultsPerWorker.NumBits(snrIdx) = resultsPerWorker.NumBits(snrIdx)+trBlkSize;
            resultsPerWorker.NumCorrectBits(snrIdx) = resultsPerWorker.NumCorrectBits(snrIdx)+sum(~blkerr .* trBlkSize);

        end % for nSlot = 0:numSlotsPerWorker-1

    end % for all SNR points

function [blkerr,trBlkSize] = slotTxRxAWGN(carrier,pdsch,encodeDLSCH,decodeDLSCH,N0)

    % Generate PDSCH indices info and indices for present slot
    [pdschIndices,pdschInfo] = nrPDSCHIndices(carrier,pdsch);

    % Calculate transport block sizes
    trBlkSize = nrTBS(pdsch.Modulation,pdsch.NumLayers,numel(pdsch.PRBSet),pdschInfo.NREPerPRB,encodeDLSCH.TargetCodeRate,0);

    % Get new transport blocks (single codeword) and flush decoder soft buffer
    trBlk = randi([0 1],trBlkSize,1);
    decodeDLSCH.TransportBlockLength = trBlkSize;

    % DL-SCH encoding
    codedTrBlock = encodeDLSCH(pdsch.Modulation,pdsch.NumLayers,pdschInfo.G,0);

    % PDSCH encoding
    pdschSymbols = nrPDSCH(carrier,pdsch,codedTrBlock);

    % Create resource grid and map PDSCH
    pdschGrid = nrResourceGrid(carrier,1,"OutputDataType","single");
    pdschGrid(pdschIndices) = pdschSymbols;

    % OFDM modulation
    [txWaveform,waveformInfo] = nrOFDMModulate(carrier,pdschGrid);

    % AWGN channel
    noise = N0*randn(size(txWaveform),"like",txWaveform);
    rxWaveform = txWaveform + noise;

    % OFDM demodulation
    rxGrid = nrOFDMDemodulate(carrier,rxWaveform);

    % Extract PDSCH
    pdschRx = nrExtractResources(pdschIndices,rxGrid);
    % PDSCH decoding, assume noise variance is known
    noiseEst = (N0.^2*waveformInfo.Nfft);
    [dlschLLRs,~] = nrPDSCHDecode(carrier,pdsch,pdschRx,noiseEst);
    % DL-SCH decoding
    [~,blkerr] = decodeDLSCH(dlschLLRs,pdsch.Modulation,pdsch.NumLayers,0);

function [encodeDLSCH,decodeDLSCH] = dlschEncoderDecoder()
    % Coding rate
    codeRate = 490/1024;
    % Create DL-SCH encoder object
    encodeDLSCH = nrDLSCH;
    encodeDLSCH.MultipleHARQProcesses = false;
    encodeDLSCH.TargetCodeRate = codeRate;
    % Create DL-SCH decoder object
    decodeDLSCH = nrDLSCHDecoder;
    decodeDLSCH.MultipleHARQProcesses = false;
    decodeDLSCH.TargetCodeRate = codeRate;
    decodeDLSCH.LDPCDecodingAlgorithm = "Normalized min-sum";
    decodeDLSCH.MaximumLDPCIterationCount = 20;

See Also

(Parallel Computing Toolbox)

Related Topics