# Implement HDL Optimized Modulo By Constant

This example shows how to use the Modulo by Constant HDL Optimized block.

The modulo operation, is an important building block for many mathematical algorithms. However, this formula for is computationally inefficient for fixed-point and integer inputs. Many embedded processors lack instructions for integer division. Those that do have them require many clock cycles to compute the answer. Division is also inefficient in commercially-available FPGAs, whose arithmetic circuits are designed for efficient multiplication, addition, and subtraction. Finally, for fixed-point modulo operations, it is difficult to optimize the word length of internal data types used for the calculation because the division operation is unbounded, even for small-wordlength inputs.

The denominator in the modulo problem is a compile-time constant, so the block can compute the floored division by using a multiplication followed by a cast. Rewriting the division operation as shows this. The constant is calculated to the precision necessary to maintain both accuracy and computational efficiency. The cast that follows discards any fractional bits, which is an efficient operation on both microprocessors and FPGAs.

The following example shows how to use the Modulo by Constant HDL Optimized block to perform this operation and provides sample resource usage and performance statistics.

### How to Use the Modulo by Constant HDL Optimized Block

The Modulo by Constant HDL Optimized block computes the modulo operation using the general strategy described above. The block requires you to specify the Denominator parameter as shown below. The block is shown below when Denominator is set to 10. The block icon displays both the mathematical expression for the modulo operation and the latency of the block. From the value of Denominator and the datatype of X, the block can compute all necessary constants and datatypes. Since it is designed for FPGA deployment, it uses the control signals validIn and validOut to indicate when X and Y are valid. Additionally, it simulates with the same latency as the generated HDL code.

To use the block, first create fixed-point input data. The format shown below is consumable by the From Workspace block.

>> X.time = (0:1:200).'; >> X.signals.values = fi(0:0.125:25,0,18,2).'; >> X.signals.dimensions = 1; 

Using the same format, create a boolean validIn signal that toggles from false to true repeatedly.

>> validIn.time = (0:1:200).'; >> validIn.signals.values = [false; repmat([true false]', 100, 1)]; >> validIn.signals.dimensions = 1; 

To finish setting up the data for the problem, set D equal to the constant denominator to use for the modulo operation.

>> D = 10; 

Open the model.

>> open_system('modulo_by_constant_block_example') The DUT_FXP subsystem computes for fixed-point inputs using the Modulo by Constant HDL Optimized block. The DUT_FLT subsystem computes using a Math Function block with the Function parameter set to mod. Because the Simulink mod operation only supports floating-point and integer inputs, the input data is cast to single precision before being input to the Math Function block. It additionally adds in a delay to match the latency of the DUT_FXP subsystem. The Compare and Plot subsystem plots all outputs and computes the difference between the fixed-point computation and the floating-point ideal. Simulate the model and examine the scope to compare the fixed-point and floating-point results.

>> sim('modulo_by_constant_block_example') 

The results from the Modulo by Constant HDL Optimized and Math Function blocks agree exactly, as the plot below shows. Note that this plot displays the latency in the system, as there is a delay between the start of the simulation and the first time validOut goes high. ### Generate HDL Code

If you have an HDL Coder license, you can generate and deploy HDL Code for the DUT_FXP subsystem as shown below.

>> makehdl('modulo_by_constant_block_example/DUT_FXP'); 

### Implemented HDL Statistics

Sample statistics for resource usage on a Xilinx® Virtex®-7 XC7VX485 FFG1157-1 device are shown below. The implemented design is able to run at greater than 500MHz on this device.

 Resources Usage _______________ _____ LUT 33 LUTRAM 8 Slice Registers 57 DSP48 1