FFT
Compute fast Fourier transform (FFT)

Libraries:
DSP HDL Toolbox /
Transforms
Description
The FFT block provides two architectures that implement the algorithm for FPGA and ASIC applications. You can select an architecture that optimizes for either throughput or area.
Streaming Radix 2^2
— Use this architecture for high-throughput applications. This architecture supports scalar or vector input data. You can achieve gigasamples-per-second (GSPS) throughput, also called super sample rates, using vector input. Since 2025a, this architecture also supports specifying the FFT size by using an input port, when you use scalar input data.Burst Radix 2
— Use this architecture for a minimum resource implementation, especially with large fast-Fourier-transform (FFT) sizes. Your system must be able to tolerate bursty data and higher latency. This architecture supports only scalar input data.
The block accepts real or complex data, provides hardware-friendly control signals, and optional output frame control signals.
Note
You can also generate HDL code for this hardware-optimized algorithm, without creating a Simulink® model, by using the DSP HDL IP Designer app. The app provides the same interface and configuration options as the Simulink block.
Examples
Implement FFT Algorithm for FPGA
Implement two hardware-optimized FFT architectures in Simulink.
Frequency-Domain Filtering in HDL
Implement a filter in the frequency domain. The filter is built with the FFT and IFFT blocks from DSP HDL Toolbox™.
Automatic Delay Matching for the Latency of FFT Block
Programmatically obtain the latency of an FFT block in a model for use in delay matching.
Ports
Input
Input data, specified as a scalar or column vector of real or complex values.
Vector input is supported with the Streaming Radix 2^2
architecture, and not supported with variable FFT size. The vector size must be a
power of 2, in the range from 1 to 64, and less than or equal to the FFT
length.
The software supports double
and
single
data types for simulation, but not for HDL code generation.
Data Types: fixed point
| int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| single
| double
Complex Number Support: Yes
Control signal that indicates if the input data is valid. When
valid is 1
(true
), the
block captures the values from the input data port. When
valid is 0
(false
), the
block ignores the values from the input data port.
When you set the Architecture parameter to Burst
Radix 2
, or set the FFT length source parameter
to Input port
, you must apply input data only when the
ready backpressure signal is 1
(true
). The block ignores the input data and
valid signals when ready is
0
(false
).
Data Types: Boolean
Since R2025a
Provide the FFT size as
log2(FFTLength)
. For
example, if the FFT length is 32, specify 5
at the port. For HDL
code generation, the FFT length must be a power of 2 between
22 and 216, so the port value
must be in the range 2 to 16. If the value is less than 2, the block uses an FFT
length of 4. If the value is greater than 16, the block uses an FFT length of
216.
To notify the block of the new FFT length, set loadFFTLen to
1
(true
) for at least one cycle with a new
value on log2FFTLen.
When you use variable FFT length, your input data must comply with the
ready backpressure signal. You can specify a new FFT length
while a frame is processing. The block sets ready to
0
(false
) while it finishes the current frame,
then sets ready to 1
(true
)
to indicate it is ready for the next frame. The block uses the new FFT length for the
next input frame. If you set loadFFTLen to 1
(true
) when no frame is processing, the block sets
ready to 0
(false
) for
four cycles while it updates internal logic.
Dependencies
To enable this port, set the FFT length source parameter to
Input port
.
Data Types: fixdt(0,M,0) where 1<M<6
Since R2025a
Control signal that indicates if the input FFT size is valid. When
loadFFTLen is 1
(true
),
the block captures the value from the input log2FFTLen port. When
loadFFTLen is 0
(false
),
the block ignores the value from the input log2FFTLen
port.
Dependencies
To enable this port, set the FFT length source parameter to
Input port
.
Data Types: Boolean
Control signal that clears internal states. When reset is
1
(true
), the block stops the current
calculation and clears internal states. When the reset is
0
(false
) and the input
valid is 1
(true
), the
block captures data for processing.
For more reset considerations, see the Reset Signal section on the Hardware Control Signals page.
Dependencies
To enable this port, on the Control Ports tab, select the Enable reset input port parameter.
Data Types: Boolean
Output
Output data, returned as a scalar or column vector of real or complex values. When input is fixed-point data type and scaling is enabled, the output data type is the same as the input data type. When the input is integer type and scaling is enabled, the output is fixed-point type with the same word length as the input integer. The output order is bit-reversed by default. If scaling is disabled, the output word length increases to avoid overflow. For more information, see the Divide butterfly outputs by two parameter.
Data Types: fixed point
| double
| single
Complex Number Support: Yes
Control signal that indicates if the output data is valid. When
valid is 1
(true
), the
block returns valid data from the output data port. When
valid is 0
(false
), the
values from the output data port are not valid.
Data Types: Boolean
Control signal that indicates that the block is ready for new input data sample on
the next cycle. When ready is 1
(true
), you can specify the data
and valid inputs for the next time step. When
ready is 0
(false
), the block ignores any input data in the next time step.
For a waveform that shows this protocol, see the timing diagrams in the Control Signals section.
Dependencies
To enable this port, set the Architecture parameter to
Burst Radix 2
, or with
Architecture set to Streaming Radix
2^2
, set the FFT length source parameter to
Input port
.
Data Types: Boolean
Control signal that indicates the first valid cycle of the output frame. When
start is 1
(true
), the
block returns the first valid sample of the frame on the output
data port.
Dependencies
To enable this port, on the Control Ports tab, select the Enable start output port parameter.
Data Types: Boolean
Control signal that indicates the last valid cycle of the output frame. When
end is 1
(true
), the
block returns the last valid sample of the frame on the output
data port.
Dependencies
To enable this port, on the Control Ports tab, select the Enable end output port parameter.
Data Types: Boolean
Parameters
Main
Specify the hardware implementation for the FFT.
Streaming Radix 2^2
— Low-latency architecture. This architecture supports GSPS throughput when you use vector input. Since R2025a, this architecture supports variable FFT length. When you use variable FFT length, your input data must comply with the ready backpressure signal.Burst Radix 2
— Minimum resource architecture. This architecture does not support vector input. When you use this architecture, your input data must comply with the ready backpressure signal. This architecture supports only fixed FFT lengths.
For more details about these architectures, see Algorithms.
Since R2025a
You can enter a constant FFT length as a parameter or provide time-varying FFT
length by using an input port. When you select Input port
,
the log2FFTLen, loadFFTLen, and
ready ports appear on the block.
Dependencies
This parameter is only available when you set Architecture
to Streaming Radix 2^2
.
When you select Input port
, the Output in
bit-reversed order and Input in bit-reversed order
parameters are disabled. The block uses natural input order and bit-reversed output
order.
Specify the number of data points used for one FFT calculation. For HDL code generation, the FFT length must be a power of 2 between 22 and 216.
Dependencies
This parameter is available when you set Architecture to
Burst Radix 2
or you set the FFT length
source parameter to Property
.
Since R2025a
Specify the maximum number of data points used for one FFT calculation. This parameter can help limit the hardware resources needed for a variable-size FFT design. The Maximum FFT length must be a power of 2 between 23 to 216.
If you provide an input frame after reset without setting the FFT length from the port, the block uses the Maximum FFT length parameter value.
Dependencies
This parameter is available when you set the FFT length
source parameter to Input port
.
Specify the hardware implementation for complex multipliers. Each multiplication
is implemented either with Use 4 multipliers and 2 adders
or with Use 3 multipliers and 5 adders
. Depending on your
synthesis tool and target device, one option may be faster or smaller.
When you select this check box, the output elements are bit-reversed relative to the input order. To return output elements in linear order, clear this check box.
The FFT algorithm calculates output in the reverse order to the input. If you specify the output to be in the same order as the input, the algorithm performs an extra reversal operation. For more information, see Linear and Bit-Reversed Output Order.
Dependencies
When you set the FFT length source parameter to
Input port
, the Output in bit-reversed
order and Input in bit-reversed order parameters
are disabled. The block uses natural input order and bit-reversed output order.
When you select this check box, the block expects input data in bit-reversed order. By default, the check box is cleared and the input is expected in linear order.
The FFT algorithm calculates output in the reverse order to the input. If you specify the output to be in the same order as the input, the algorithm performs an extra reversal operation. For more information, see Linear and Bit-Reversed Output Order.
Dependencies
When you set the FFT length source parameter to
Input port
, the Output in bit-reversed
order and Input in bit-reversed order parameters
are disabled. The block uses natural input order and bit-reversed output order.
When you select this parameter, the FFT implements an overall 1/FFTLength scale factor by dividing the output of each butterfly multiplication by two. This adjustment keeps the output of the FFT in the same amplitude range as its input. If you disable scaling, the FFT avoids overflow by increasing the word length by 1 bit after each butterfly multiplication. The bit growth is the same for both architectures.
Data Types
Rounding mode used for internal fixed-point calculations. When the input is any
integer or fixed-point data type, this block uses fixed-point arithmetic for internal
calculations. This option does not apply when the input data is
single
or double
type. Rounding applies to
twiddle-factor multiplication and scaling operations. For more information about
rounding modes, see Rounding Modes.
Control Ports
Enable the input reset port on the block.
Enable the output start port on the block. This output signal indicates the first sample of a frame of output data.
Enable the output end port on the block. This output signal indicates the last sample of a frame of output data.
Algorithms
The streaming Radix 2^2 architecture implements a low-latency architecture. It saves resources compared to a streaming Radix 2 implementation by factoring and grouping the FFT equation. The architecture has log4(N) stages. Each stage contains two single-path delay feedback (SDF) butterflies with memory controllers. When you use vector input, each stage operates on fewer input samples, so some stages reduce to a simple butterfly, without SDF.
The first SDF stage is a regular butterfly. The second stage multiplies the outputs of the first stage by –j. To avoid a hardware multiplier, the block swaps the real and imaginary parts of the inputs, and again swaps the imaginary parts of the resulting outputs. Each stage rounds the result of the twiddle factor multiplication to the input word length. The twiddle factors have two integer bits, and the rest of the bits are used for fractional bits. The twiddle factors have the same bit width as the input data, WL. The twiddle factors have two integer bits, and WL-2 fractional bits.
If you enable scaling, the algorithm divides the result of each butterfly stage by 2. Scaling at each stage avoids overflow, keeps the word length the same as the input, and results in an overall scale factor of 1/N. If scaling is disabled, the algorithm avoids overflow by increasing the word length by 1 bit at each stage. The diagram shows the butterflies and internal word lengths of each stage, not including the memory.
The burst Radix 2 architecture implements the FFT by using a single complex butterfly multiplier. The algorithm cannot start until it has stored the entire input frame, and it cannot accept the next frame until computations are complete. The output ready port indicates when the algorithm is ready for new data. The diagram shows the burst architecture, with pipeline registers.
When you use this architecture, your input data must comply with the ready backpressure signal.
The algorithm processes input data only when the input valid port is 1. Output data is valid only when the output valid port is 1.
When the optional input reset port is 1, the algorithm stops the current calculation and clears all internal states. The algorithm begins new calculations when reset port is 0 and the input valid port starts a new frame.
This diagram shows the input and output valid port values for contiguous scalar input data, streaming Radix 2^2 architecture, an FFT length of 1024, and a vector size of 16.
The diagram also shows the optional start and end port values that indicate frame boundaries. If you enable the start port, the start port value pulses for one cycle with the first valid output of the frame. If you enable the end port, the start port value pulses for one cycle with the last valid output of the frame.
If you apply continuous input frames, the output will also be continuous after the initial latency.
The input valid port can be noncontiguous. Data accompanied by an input valid port is processed as it arrives, and the resulting data is stored until a frame is filled. Then the algorithm returns contiguous output samples in a frame of N (FFT length) cycles. This diagram shows noncontiguous input and contiguous output for an FFT length of 512 and a vector size of 16.
When you use the burst architecture, you cannot provide the next frame of input data until
memory space is available. The ready signal indicates when the
algorithm can accept new input data. You must apply input data and
valid signals only when ready is
1
(true
). The algorithm ignores any input
data and valid signals when
ready is 0
(false
).
When you specify the FFT length from an input port, you must only send data when the
ready signal indicates the algorithm can accept it. The block sets
ready to 0
(false
) if there
is no frame processing when you change the FFT length, or while it is finishing a frame
after a new FFT length has been loaded. The first waveform shows loading the input size
before any frame is processed. The ready signal goes to
0
(false
) while the block updates internal logic
with the new size, and then ready is set to 1
(true
) again. The input data is applied and
valid set to 1
(true
) after
the ready signal is 1
(true
)
again.
The next waveform shows loading a new FFT size while a frame is processing. The block sets
the ready signal to 0
(false
)
until it has completed returning the current frame, then sets ready to
1
(true
). You can apply the next frame once the
ready signal is 1
(true
).
The latency varies with the FFT length and input vector size. After you update the model, the block icon displays the latency. The displayed latency is the number of cycles between the first valid input and the first valid output, assuming the input is contiguous. To obtain this latency programmatically, see Automatic Delay Matching for the Latency of FFT Block.
When using the burst architecture with a contiguous input, if your design waits for
ready to output 0
before de-asserting the input
valid, then one extra cycle of data arrives at the input. This data
sample is the first sample of the next frame. The algorithm can save one sample while
processing the current frame. Due to this one sample advance, the observed latency of the
later frames (from input valid to output valid) is
one cycle shorter than the reported latency. The latency is measured from the first cycle,
when input valid is 1 to the first cycle when output
valid is 1. The number of cycles between when
ready port is 0 and the output valid port is 1
is always latency – FFTLength.
This resource and performance data is the synthesis result from the generated HDL targeted to a Xilinx® Virtex®-6 (XC6VLX75T-1FF484) FPGA. The examples in the tables have this configuration:
1024 FFT length (default)
Complex multiplication using 4 multipliers, 2 adders
Output scaling enabled
Natural order input, Bit-reversed output
16-bit complex input data
Clock enables minimized (HDL Coder™ parameter)
Performance of the synthesized HDL code varies with your target and synthesis options. For instance, reordering for a natural-order output uses more RAM than the default bit-reversed output, and real input uses less RAM than complex input.
For a scalar input Radix 2^2 configuration, the design achieves 326 MHz clock frequency. The latency is 1116 cycles. The design uses these resources.
Resource | Number Used |
---|---|
LUT | 4597 |
FFS | 5353 |
Xilinx LogiCORE® DSP48 | 12 |
Block RAM (16K) | 6 |
When you vectorize the same Radix 2^2 implementation to process two 16-bit input samples in parallel, the design achieves 316 MHz clock frequency. The latency is 600 cycles. The design uses these resources.
Resource | Number Used |
---|---|
LUT | 7653 |
FFS | 9322 |
Xilinx LogiCORE DSP48 | 24 |
Block RAM (16K) | 8 |
When using the burst Radix 2 architecture, the block supports only scalar input data. The burst design achieves 309 MHz clock frequency. The latency is 5811 cycles. The design uses these resources.
Resource | Number Used |
---|---|
LUT | 971 |
FFS | 1254 |
Xilinx LogiCORE DSP48 | 3 |
Block RAM (16K) | 6 |
When you specify FFT length from the input port, the block implements an internal FFT
using the Maximum FFT length value. There is also a small amount of
logic to store and load the FFT length from the port, and one extra multiplier stage. When
you specify FFT length from a port, the block supports only scalar input data. The design
achieves 375 MHz clock frequency. The latency changes with the FFT length. With the
Maximum FFT length parameter set to 1024
, the
design uses these resources.
Resource | Number Used |
---|---|
LUT | 7991 |
FFS | 7902 |
Xilinx LogiCORE DSP48 | 20 |
Block RAM (16K) | 4 |
References
[1] Algnabi, Y.S, F.A. Aldaamee, R. Teymourzadeh, M. Othman, and M.S. Islam. “Novel architecture of pipeline Radix 2^2 SDF FFT Based on digit-slicing technique.” 10th IEEE International Conference on Semiconductor Electronics (ICSE). 2012, pp. 470–474.
Extended Capabilities
This block supports C/C++ code generation for Simulink accelerator and rapid accelerator modes and for DPI component generation.
HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic.
This block has one default HDL architecture.
ConstrainedOutputPipeline | Number of registers to place at
the outputs by moving existing delays within your design. Distributed
pipelining does not redistribute these registers. The default is
|
InputPipeline | Number of input pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is
|
OutputPipeline | Number of output pipeline stages
to insert in the generated code. Distributed pipelining and constrained
output pipelining can move these registers. The default is
|
You cannot generate HDL code for this block inside an Enabled Subsystem (Simulink).
Version History
Introduced in R2014aYou can now specify time-varying FFT length by using an input port. This feature is
supported only when you set Architecture to Streaming
Radix 2^2
and you use scalar input data.
Set FFT length source parameter to Input
port
, and then specify the FFT length at the log2FFTLen
port as log2(FFTLength)
. For
example, if the FFT length is 32, specify 5
at the port. Set the
loadFFTLen port to 1
(true
) for
at least one cycle to capture the value from the input log2FFTLen
port.
When you use variable FFT length, you must send input data only when the
ready backpressure signal is 1
(true
). The block sets the ready signal to
0
(false
) if a new FFT length is supplied while
processing a frame of data. The block sets the ready signal to
1
(true
) when it has completed a frame of output
data and has updated the FFT length for the next input frame.
To limit the hardware resources used in the variable-sized FFT implementation, set the Maximum FFT length parameter to your expected maximum input size.
Before R2022a, this block was named FFT HDL Optimized and was included in the DSP System Toolbox™ DSP System Toolbox HDL Support library.
You can now set the FFT length to 4 (22). In previous releases the FFT length had to be a power of 2 from 8 (23) to 216.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.
Amériques
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)