Main Content

Streaming: Area Optimization

This example shows how to use the subsystem level streaming optimization in HDL Coder™.

Introduction

Streaming is a subsystem-wide optimization supported by HDL Coder for implementing area-efficient hardware. By default, the coder implements hardware that is bit-accurate and cycle-accurate to the Simulink® model. This implies that vector datapaths in Simulink map inefficiently to hardware. Consider a product block in Simulink that operates on two 64-element vector inputs and generates a 64-element vector output. This block executes 64 multiplications in a single Simulink time step. To remain cycle-accurate, HDL Coder maps this block to 64 parallel multipliers in the generated HDL code. Given that multipliers are expensive on FPGAs, this is an inefficient hardware implementation.

Streaming is an optimization that flattens a vector datapath to either a scalar or a smaller sized vector datapath. The idea is to serialize the execution of parallel hardware, so that resources can be shared and the vector data can be time-multiplexed over the shared resources.

Consider the following example model that operates on a 24-element vector datapath. This model contains three vector gains and two vector adds, resulting in a hardware implementation containing 72 multipliers and 24 adders. This can be confirmed by generating the resource utilization report when generating HDL code.

load_system('hdl_areaopt1');
open_system('hdl_areaopt1/Controller');
hdlset_param('hdl_areaopt1/Controller', 'StreamingFactor', 0);
hdlset_param('hdl_areaopt1', 'ResourceReport', 'on');
makehdl('hdl_areaopt1/Controller');
### Working on the model <a href="matlab:open_system('hdl_areaopt1')">hdl_areaopt1</a>
### Generating HDL for <a href="matlab:open_system('hdl_areaopt1/Controller')">hdl_areaopt1/Controller</a>
### Using the config set for model <a href="matlab:configset.showParameterGroup('hdl_areaopt1', { 'HDL Code Generation' } )">hdl_areaopt1</a> for HDL code generation parameters.
### Running HDL checks on the model 'hdl_areaopt1'.
### Begin compilation of the model 'hdl_areaopt1'...
### Working on the model 'hdl_areaopt1'...
### Working on... <a href="matlab:configset.internal.open('hdl_areaopt1', 'GenerateModel')">GenerateModel</a>
### Begin model generation 'gm_hdl_areaopt1'...
### Copying DUT to the generated model....
### Model generation complete.
### Generated model saved at <a href="matlab:open_system('hdlsrc/hdl_areaopt1/gm_hdl_areaopt1.slx')">hdlsrc/hdl_areaopt1/gm_hdl_areaopt1.slx</a>
### Begin VHDL Code Generation for 'hdl_areaopt1'.
### Working on hdl_areaopt1/Controller as hdlsrc/hdl_areaopt1/Controller.vhd.
### Generating package file hdlsrc/hdl_areaopt1/Controller_pkg.vhd.
### Code Generation for 'hdl_areaopt1' completed.
### Generating HTML files for code generation report at <a href="matlab:hdlcoder.report.openDdg('/tmp/Bdoc24a_2528353_2768217/tpcc1347d7/hdlcoder-ex46027221/hdlsrc/hdl_areaopt1/html/hdl_areaopt1_codegen_rpt.html')">hdl_areaopt1_codegen_rpt.html</a>
### Creating HDL Code Generation Check Report file:///tmp/Bdoc24a_2528353_2768217/tpcc1347d7/hdlcoder-ex46027221/hdlsrc/hdl_areaopt1/Controller_report.html
### HDL check for 'hdl_areaopt1' complete with 0 errors, 0 warnings, and 0 messages.
### HDL code generation complete.

Streaming to Scalarize the Datapath

An efficient area implementation of the same model can be realized by setting a positive integer value to the StreamingFactor implementation parameter on the subsystem. This parameter specifies the extent to which the datapath is scalarized - the higher the value, the greater the area savings. In this example, we have a 24-element vector datapath; to fully scalarize it, specify a StreamingFactor value of 24. This can be done either through the HDL block properties dialog (opened by right-clicking on the Controller subsystem) or through the command hdlset_param.

Generating HDL code with StreamingFactor set to 24, generates HDL that uses only three multipliers and two adders (see the resource report after HDL code generation). The streamed architecture is implemented as local multi-rate or in single-rate mode depending on the context of the subsystem being streamed. If the subsystem logic is operating at a slower sample rate or if the Oversampling factor is set to a value greater than one, then clock-rate pipelining kicks in and a streamed subsystem is implemented as a multi-cycle, single-rate architecture. See Single-Rate Resource Sharing Architecture for more details. In all other cases, a local multi-rate implementation is created, as described in this example. The elements of the vector datapath are streamed at a faster rate (in this case 24 times faster and denoted in red) and all computations operate on a scalar datapath. At the output, the vector is reconstructed using a deserializer and the output is sampled at the slower rate (as seen in the generated model in green).

hdlset_param('hdl_areaopt1/Controller', 'StreamingFactor', 24);
hdlset_param('hdl_areaopt1', 'GenerateValidationModel', 'on');
makehdl('hdl_areaopt1/Controller');
open_system('gm_hdl_areaopt1/Controller');
%set_param('gm_hdl_areaopt1', 'SimulationCommand', 'update');
### Working on the model <a href="matlab:open_system('hdl_areaopt1')">hdl_areaopt1</a>
### Generating HDL for <a href="matlab:open_system('hdl_areaopt1/Controller')">hdl_areaopt1/Controller</a>
### Using the config set for model <a href="matlab:configset.showParameterGroup('hdl_areaopt1', { 'HDL Code Generation' } )">hdl_areaopt1</a> for HDL code generation parameters.
### Running HDL checks on the model 'hdl_areaopt1'.
### Begin compilation of the model 'hdl_areaopt1'...
### Working on the model 'hdl_areaopt1'...
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 1: 1 cycles.
### Working on... <a href="matlab:configset.internal.open('hdl_areaopt1', 'GenerateModel')">GenerateModel</a>
### Begin model generation 'gm_hdl_areaopt1'...
### Rendering DUT with optimization related changes (IO, Area, Pipelining)...
### Model generation complete.
### Generated model saved at <a href="matlab:open_system('hdlsrc/hdl_areaopt1/gm_hdl_areaopt1.slx')">hdlsrc/hdl_areaopt1/gm_hdl_areaopt1.slx</a>
### Generating new validation model: '<a href="matlab:open_system('hdlsrc/hdl_areaopt1/gm_hdl_areaopt1_vnl')">gm_hdl_areaopt1_vnl</a>'.
### Validation model generation complete.
### Begin VHDL Code Generation for 'hdl_areaopt1'.
### MESSAGE: The design requires 24 times faster clock with respect to the base rate = 2.
### Begin VHDL Code Generation for 'Controller_tc'.
### Working on Controller_tc as hdlsrc/hdl_areaopt1/Controller_tc.vhd.
### Code Generation for 'Controller_tc' completed.
### Working on hdl_areaopt1/Controller as hdlsrc/hdl_areaopt1/Controller.vhd.
### Generating package file hdlsrc/hdl_areaopt1/Controller_pkg.vhd.
### Code Generation for 'hdl_areaopt1' completed.
### Generating HTML files for code generation report at <a href="matlab:hdlcoder.report.openDdg('/tmp/Bdoc24a_2528353_2768217/tpcc1347d7/hdlcoder-ex46027221/hdlsrc/hdl_areaopt1/html/hdl_areaopt1_codegen_rpt.html')">hdl_areaopt1_codegen_rpt.html</a>
### Creating HDL Code Generation Check Report file:///tmp/Bdoc24a_2528353_2768217/tpcc1347d7/hdlcoder-ex46027221/hdlsrc/hdl_areaopt1/Controller_report.html
### HDL check for 'hdl_areaopt1' complete with 0 errors, 0 warnings, and 1 messages.
### HDL code generation complete.

Delay Balancing and Functional Equivalence

The rate transitions that implement time-multiplexing in the streaming architecture introduce a cycle of additional latency. To maintain functional fidelity, this delay must be balanced across all cut-sets that this path is a member of. When the streaming option is turned on, the coder automatically also turns on the delay balancing option (BalanceDelays) to automatically balance this additional delay. The coder also automatically turns on the validation model generation option so the user can verify that functional equivalence is maintained with respect to the original model.

sim('gm_hdl_areaopt1_vnl');
open_system('gm_hdl_areaopt1_vnl/Compare/Assert_Out1/compare: Out1')

Parameterizability for More Flexibility

By tuning the StreamingFactor parameter, one can explore the design space along the datapath size dimension. A value of 1 implies no streaming (or fully parallel implementation), and a value of 24 (or the full vector length) implies maximal streaming (or fully serial implementation). By picking values between these two extremes, one can explore the design space from fully parallel to fully serial implementations.

If we set StreamingFactor to 6 in this example model, we get a four-element vector datapath in the generated HDL. This results in the use of 12 multipliers and 8 adders as shown in the resource report.

hdlset_param('hdl_areaopt1/Controller', 'StreamingFactor', 6);
makehdl('hdl_areaopt1/Controller');
open_system('gm_hdl_areaopt1/Controller');
%set_param('gm_hdl_areaopt1', 'SimulationCommand', 'update');
### Working on the model <a href="matlab:open_system('hdl_areaopt1')">hdl_areaopt1</a>
### Generating HDL for <a href="matlab:open_system('hdl_areaopt1/Controller')">hdl_areaopt1/Controller</a>
### Using the config set for model <a href="matlab:configset.showParameterGroup('hdl_areaopt1', { 'HDL Code Generation' } )">hdl_areaopt1</a> for HDL code generation parameters.
### Running HDL checks on the model 'hdl_areaopt1'.
### Begin compilation of the model 'hdl_areaopt1'...
### Working on the model 'hdl_areaopt1'...
### The DUT requires an initial pipeline setup latency. Each output port experiences these additional delays.
### Output port 1: 1 cycles.
### Working on... <a href="matlab:configset.internal.open('hdl_areaopt1', 'GenerateModel')">GenerateModel</a>
### Begin model generation 'gm_hdl_areaopt1'...
### Rendering DUT with optimization related changes (IO, Area, Pipelining)...
### Model generation complete.
### Generated model saved at <a href="matlab:open_system('hdlsrc/hdl_areaopt1/gm_hdl_areaopt1.slx')">hdlsrc/hdl_areaopt1/gm_hdl_areaopt1.slx</a>
### Generating new validation model: '<a href="matlab:open_system('hdlsrc/hdl_areaopt1/gm_hdl_areaopt1_vnl')">gm_hdl_areaopt1_vnl</a>'.
### Validation model generation complete.
### Begin VHDL Code Generation for 'hdl_areaopt1'.
### MESSAGE: The design requires 6 times faster clock with respect to the base rate = 2.
### Begin VHDL Code Generation for 'Controller_tc'.
### Working on Controller_tc as hdlsrc/hdl_areaopt1/Controller_tc.vhd.
### Code Generation for 'Controller_tc' completed.
### Working on hdl_areaopt1/Controller as hdlsrc/hdl_areaopt1/Controller.vhd.
### Generating package file hdlsrc/hdl_areaopt1/Controller_pkg.vhd.
### Code Generation for 'hdl_areaopt1' completed.
### Generating HTML files for code generation report at <a href="matlab:hdlcoder.report.openDdg('/tmp/Bdoc24a_2528353_2768217/tpcc1347d7/hdlcoder-ex46027221/hdlsrc/hdl_areaopt1/html/hdl_areaopt1_codegen_rpt.html')">hdl_areaopt1_codegen_rpt.html</a>
### Creating HDL Code Generation Check Report file:///tmp/Bdoc24a_2528353_2768217/tpcc1347d7/hdlcoder-ex46027221/hdlsrc/hdl_areaopt1/Controller_report.html
### HDL check for 'hdl_areaopt1' complete with 0 errors, 0 warnings, and 1 messages.
### HDL code generation complete.