Main Content

Performing Large Matrix Multiplication on FPGAs External DDR Memory Using Ethernet Based MATLAB as AXI Master

This example shows how to use Ethernet-based MATLAB as AXI Master to access external memory connected to the FPGA. This example also shows how to:

  1. Generate an HDL IP core with interface.

  2. Access large matrices from the external DDR3 memory on the Xilinx Kintex-7 KC705 board using the Ethernet based MATLAB as AXI4 Master interface.

  3. Perform matrix vector multiplication in the HDL IP core and write the output result back to the DDR3 memory using the Ethernet based MATLAB as AXI4 Master.

Before You Begin

To run this example, you must have the following software and hardware installed and set up:

  • Xilinx® Vivado® Design Suite, with supported version listed in the HDL Coder documentation

  • Xilinx Kintex-7 KC705 Evaluation Kit

  • JTAG cable and Ethernet cable for connecting to KC705 FPGA

  • HDL Coder Support Package for Xilinx FPGA Boards

  • HDL Verifier Support Package for Xilinx FPGA Boards

Introduction

This example models a matrix vector multiplication algorithm and implements the algorithm on the Xilinx Kintex-7 KC705 board. Large matrices may not map efficiently to Block RAMs on the FPGA fabric. Instead, we can store the matrices in the external DDR3 memory on the FPGA board. The Ethernet based MATLAB as AXI Master interface can access the data by communicating with vendor-provided memory interface IP cores that interface with the DDR3 memory. This capability enables you to model algorithms that involve large data processing and requires high-throughput DDR access, such as matrix operations, computer vision algorithms, and so on.

The matrix vector multiplication module supports fixed-point matrix vector multiplication, with a configurable matrix size ranging from 2 to 4000. The size of the matrix is run-time configurable through AXI4 accessible register.

Open the model by typing the following in MATLAB.

Model Algorithm

This example model includes an FPGA implementable DUT (Design-Under-Test) block, a DDR functional behavior block, and a test environment to drive inputs and verify the expected outputs.

The DUT subsystem contains an AXI4 Master read/write controller along with a matrix vector multiplication module. Using the AXI4 Master interface, the DUT subsystem reads data from the external DDR3 memory, feeds the data into the Matrix_Vector_Multiplication module, and then writes the output data to the external DDR3 memory using Ethernet based MATLAB as AXI Master interface. The DUT module has several parameter ports. These ports will be mapped to AXI4 accessible registers, so you can adjust these parameters from MATLAB, even after you implement the design onto the FPGA.

The matrix_mul_on port controls whether to run the Matrix_Vector_Multiplication module. When the input to matrix_mul_on is true, the DUT subsystem performs matrix vector multiplication as described earlier. When the input to matrix_mul_on is false, the DUT subsystem performs a data loop back mode. In this mode, the DUT subsystem reads data from the external DDR3 memory, writes it into the Internal_Memory module, and then write the same data back to the external DDR3 memory. The data loop back mode is a simple way to verify the functionality of the AXI4 Master external DDR3 memory access.

Also inside the DUT subsystem, the Matrix_Vector_Multiplication module uses a multiply-add block to implement a streaming dot-product computation for the inner-product of the matrix vector multiplication.

Let us say, A is a matrix of size NxN and B is a vector of size Nx1.

Then, matrix vector multiplication output will be: Z = A * B, of size Nx1

The first N values from the DDR are treated as the Nx1 size vector, followed by NxN size matrix data. First N values (vector data) are stored into a RAM. From N+1 values onwards, data is directly streamed as matrix data. Vector data will be read from the Vector_RAM in parallel. Both matrix and vector inputs are fed into the Matrix_mul_top subsystem. The first matrix output is available after N clock cycles and will be stored into output RAM. Again, vector RAM read address is reinitialized to 0 and starts reading same vector data corresponding to new matrix stream. This operation is repeated for all the rows of the matrix.

The following diagram shows the architecture of the Matrix_Vector_Multiplication module.

Generate HDL IP core with Ethernet Based MATLAB as AXI Master

Next, we start the HDL Workflow Advisor and use the IP Core Generation workflow to deploy this design on the Xilinx Kintex-7 hardware.

1. Set up the Xilinx Vivado synthesis tool path using the following command in the MATLAB command window. Use your own Vivado installation path when you run the command.

  hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2018.2\bin\vivado.bat')

2. Start the HDL Workflow Advisor from the DUT subsystem, hdlcoder_external_memory_axi_master/DUT. The target interface settings are saved on the model. In step 1.1 the Target workflow is IP Core Generation, Target platform is Xilinx Kintex-7 KC705 development board.

3. Select the Reference Design in step 1.2 as External DDR3 Memory Access with Ethernet based MATLAB as AXI Master.

4. The Target platform interface table settings are as shown below.

In this example, the input parameter ports like matrix_mul_on, matrix_size, burst_len, burst_from_ddr and burst start are mapped to the AXI4 interface. HDL Coder will generate AXI4 interface accessible registers for these ports. Later, you can use MATLAB to tune these parameters at run-time when the design is running on FPGA board.

The Ethernet based MATLAB as AXI Master interface has separate Read and Write channels. The read channel ports such as axim_rd_data, axim_rd_s2m, and axim_rd_m2s are mapped to AXI4 Master Read interface. The write channel ports such as axim_wr_data, axim_wr_s2m, and axim_wr_m2s are mapped to AXI4 Master Write interface.

5. Right-click Task 3.2, Generate RTL Code and IP Core, and select Run to Selected Task to generate the IP core. You can find the register address mapping and other documentation for the IP core in the generated IP Core Report.

6. Right-click Task 4.1 Create project, and select Run This Task to generate the Vivado project. During the project creation, the generated DUT IP core is integrated into the External DDR3 Memory Access with Ethernet based MATLAB as AXI Master reference design. This reference design comprises of a Xilinx Memory Interface Generator IP to communicate with the on-board external DDR3 memory on KC705 platform. The MATLAB as AXI Master IP is also added to enable MATLAB to control the DUT IP, and to initialize and verify the DDR memory content.

You can view the generated Vivado project by clicking on the project link in the result window and inspect the design.

The Ethernet based MATLAB as AXI Master IP has a default target IP Address of 192.168.0.2 and default UDP Port value of 50101. These values can be changed by double clicking on the ethernet_mac_hub IP in the Vivado block design.

7. Now Right-click Task 4.3 Program Target Device, and select Run to Selected Task to generate bitstream and program the device.

Run FPGA Implementation on Kintex-7 Hardware

You can now run the FPGA implementation, and verify the hardware result by running following script in MATLAB:

  hdlcoder_external_memory_axi_master_hw_run

This script first initializes the Matrix_Size to 500, which means a 500x500 matrix. You can adjust the Matrix_Size up to 4000.

The AXI4 Master Read and Write channel base addresses are then configured. These addresses defines the base address that DUT reads from, and writes to external DDR memory. In this script, the DUT is reading from base address '40000000', and writing to base address '50000000'.

MATLAB as AXI Master feature is used to initialize the external DDR3 memory with input vector and matrix data, and also clear the output DDR memory location.

The DUT calculation is started by controlling the AXI4 accessible registers. The DUT IP core first read input data from the DDR memory, perform the matrix vector multiplication, and then write the result back to the DDR memory.

Finally, the output result is read back to MATLAB, and compared with the expected value. In this way, the hardware results are verified in MATLAB.