FFT slowdown even after workspace reset

Question

1 vote

I'm experiencing behavior with the fft() function that is causing me to have to restart Matlab between executions of a long script that is both processing and memory intensive and requires, among other things, millions of fft's on the CPU and GPU. If I run bench() prior to running the script, my computer (i9-13950HX w/64GB of ram, running Windows 11, Matlab R2024a) clocks in very fast. After I run my script, all performance metrics are basically identical except for fft() which clocks >10x slower than before.

No matter what I do to the workspace (clear all, clear classes, clear functions, close all hidden force, clc, reset(gpuDevice), etc.), or the fft planner I cannot bring the performance of fft() back to what it was before execution of the script.

Am I overlooking anything that could reset the performance of the fft short of restarting Matlab itself? I would like to let the computer loop over a bunch of datasets but right now the slowdown in the fft is making this very inefficient. I am currently considering calling the Matlab engine from Python so that I can restart it between script calls to prevent this. I am running Matlab on 2024a and may be able to update to 2024b but cannot upgrade past 2024b.

31 commentaires
Afficher 29 commentaires plus anciens Masquer 29 commentaires plus anciens

Timothy le 2 Juin 2026

@dpb, thanks I might give that a shot.

@Paul, I found the culprit function, which calls another function which splits a bunch of large complex double precision N x M arrays into three dimensional 32 x M x (N/32) arrays, computes the FFT in the column dimension, multiplies with the conjugate of another 32 x M x (N/32) array, inverse Fourier transforms & normalizes to create a bunch of cross-correlations. However, it seems that if I isolate this sub function and run it a bunch of times by itself it doesn't affect the performance of the fft() function. Still, if I delete the sub function from the larger function I also have no reduction in performance, so I know that it is tied to it somehow. If I can get more specific or create a simple toy function that creates the fft performance loss I'm seeing I will post it here.

@Walter Roberson: Maybe this is happening, I don't know how to monitor what cores are used. Matlab is the main process on my computer however, and this slowdown can be created in under 10 minutes of operations by iterating the function described above in a loop. If I call the bench in each iteration of the loop and store the FFT time I can watch it slow down each iteration starting with the 4th (I get about 25 iterations in 10 minutes, by which time the FFT score has gone from ~0.15 seconds to ~1 second, and keeps slowing down the more times I call the function).

Timothy le 4 Juin 2026

Modifié(e) : Timothy le 4 Juin 2026

Ouvrir dans MATLAB Online

@Steven Lord CPU, at least, the GPU hasn't been touched yet when I can generate the problem. Here is a script that reproduces part of the problem. The crazy thing is, I was wrong about the FFT calls being a part of the problem. I can delete all of those cross-correlations and still get a slowdown for fft. The example script below is an example:

out     = F;
 
function [out] = F()
    for n = 1:10
        NN  = 500;
        MM  = 500;
        C   = cell(MM, NN);
        for nn = 1:NN
            for mm = 1:MM
                C{mm, nn} = randn(21, 21);
            end
        end
        out{n} = C;
        disp(n);
    end
end

If I run this mini-script and call bench() or just tst = randn(1, 2^25); tic; fft(tst); toc (note that I actually execute: tic; fft(tst); toc, multiple times to get an average and let the planner optimize), I get a slow down of about 2X. On one machine, the fft speed goes from ~0.15 seconds to ~0.3 seconds. If I clear the workspace in this case the fft speed goes back to normal, e.g. ~0.15 seconds. However, if I re-run the mini script above and then re-run tst = randn(1, 2^25); tic; fft(tst); toc (without clearing the workspace) instead of being ~0.3 seconds, now execution of the fft takes ~0.65 seconds. If I clear the workspace, I'm back to ~0.15 seconds. If I run it a third time, now execution of the fft takes ~0.78 seconds (for the five last executions, as I'm writing this, toc registered 0.775059, 0.775901, 0.779967, 0.772605). So something odd with the fft time seems to be happening (tested on R2024a and R2024b, different computers, slightly different results, the 2024b computer has a slowdown of ~0.32, ~0.48, ~0.52, ~0.63 as a I clear the work space and execute the miniscript above between speed tests).

The behavior I am having reproducing from my other script, which is doing a lot more, is the persistence of the slowdown. In my other script, the slowdown of the fft persists even after workspace clearing. I will reach out to tech support.

Timothy le 5 Juin 2026

Ouvrir dans MATLAB Online

The slowdown can be observed more easily using the following code:

for n = 1:5
    clear out
    tst     = randn(1, 2^25); 
    FF      = @()fft(tst);
    T1      = timeit(FF);
    out     = F;
    tst     = randn(1, 2^25); 
    FF      = @()fft(tst);
    T2      = timeit(FF);
    disp(['Cleared workspace time: ', num2str(T1)]);
    disp(['Uncleared workspace time: ', num2str(T2)]);
    drawnow;
end
function [out] = F()
    out     = cell(1, 10);
    for n = 1:15
        NN  = 500;
        MM  = 500;
        C   = cell(MM, NN);
        for nn = 1:NN
            for mm = 1:MM
                C{mm, nn} = randn(21, 21);
            end
        end
        out{n} = C;
    end
end

My output was:

Cleared workspace time: 0.17438

Uncleared workspace time: 0.40605

Cleared workspace time: 0.17214

Uncleared workspace time: 0.9722

Cleared workspace time: 0.17484

Uncleared workspace time: 1.8431

Cleared workspace time: 0.17464

Uncleared workspace time: 1.8422

Cleared workspace time: 0.17555

Uncleared workspace time: 3.3422

on the machine I'm currently at. Note this doesn't reproduce the persistence (despite clearing the workspace) that I'm observing elsewhere, but I don't know if that persistence is necessary to cause the performance drop I'm seeing in my original code.

dpb le 5 Juin 2026

Modifié(e) : dpb le 5 Juin 2026

Ouvrir dans MATLAB Online

Yeah, I'm sure it was disk thrashing -- although I had kinda' forgotten when the other machine died and I resurrected this one that it has only 16GB and I didn't have any compatible sticks around to use so just left it having retired from the real consulting gig so big stuff doesn't come around much any more.

Anyways, if I do then rerun with M=11, the results are signficantly different...

>> h0=fnhrs(now); tim; h1=fnhrs(now); fprintf('\nElapsed time=%0.1f min\n',(h1-h0)*60)
Cleared workspace time: 0.58582
Uncleared workspace time: 0.59993
Cleared workspace time: 0.57683
Uncleared workspace time: 0.58008
Cleared workspace time: 0.59105
Uncleared workspace time: 0.568
Cleared workspace time: 0.58876
Uncleared workspace time: 0.56444
Cleared workspace time: 0.58453
Uncleared workspace time: 0.57072
Elapsed time=5.6 min
>> 

I wouldn't be surprised about being OS memory management related; which was why I wondered earlier if it could be shown to be related to memory footprint and whether there was any discernible degradation with size or whether just "over the cliff" at some point.

ADDENDUM

I made slight modification to the F() function to accomodate...

function [out] = F()
  M=11;
  N=15;
    out= cell(1, N);
    for n = 1:N
        NN  = 500;
        MM  = 500;
        C   = cell(MM, NN);
        for nn = 1:NN
            for mm = 1:MM
                C{mm, nn} = randn(M);
            end
        end
        out{n} = C;
    end
end

ADDENDUM SECOND

I hadn't looked that closely, but another slight rearrangeement as

function [out] = F()
  M=11;
  N=15;
  NN  = 500;
  MM  = 500;
  C=cell(MM, NN);
    out= cell(1, N);
    for n = 1:N
        for nn = 1:NN
            for mm = 1:MM
                C{mm, nn} = randn(M);
            end
        end
        out{n} = C;
    end
end

of moving the constants and preallocation out of the loop resulted in essentially the same timings but ran signfificantly faster clock time...

>> h0=fnhrs(now); tim; h1=fnhrs(now); fprintf('\nElapsed time=%0.1f min\n',(h1-h0)*60)
Cleared workspace time: 0.58512
Uncleared workspace time: 0.56866
Cleared workspace time: 0.58118
Uncleared workspace time: 0.55005
Cleared workspace time: 0.57309
Uncleared workspace time: 0.55476
Cleared workspace time: 0.58512
Uncleared workspace time: 0.56051
Cleared workspace time: 0.57412
Uncleared workspace time: 0.56525
Elapsed time=3.9 min
>> 

dpb le 5 Juin 2026

Ouvrir dans MATLAB Online

One last fling before let it ride -- will be interesting to hear what light Mathworks support can shed on the symptoms you observe. But, why not try it on competing OS and recent release to see what happens...

 fnhrs=@(n)(n-fix(n))*24;
 h0=fnhrs(now);
 for n = 1:5
    clear out
    tst     = randn(1, 2^25); 
    FF      = @()fft(tst);
    T1      = timeit(FF);
    out     = F;
    tst     = randn(1, 2^25); 
    FF      = @()fft(tst);
    T2      = timeit(FF);
    disp(['Cleared workspace time:   ', num2str(T1,'%0.5f')]);
    disp(['Uncleared workspace time: ', num2str(T2,'%0.5f')]);
    %drawnow;
 end
Cleared workspace time:   0.25048
Uncleared workspace time: 0.22922
Cleared workspace time:   0.23914
Uncleared workspace time: 0.24910
Cleared workspace time:   0.27493
Uncleared workspace time: 0.26221
Cleared workspace time:   0.25566
Uncleared workspace time: 0.25958
Cleared workspace time:   0.25795
Uncleared workspace time: 0.26588
 h1=fnhrs(now); fprintf('\nElapsed time=%0.1f min\n',(h1-h0)*60) 
Elapsed time=1.1 min
function [out] = F()
  M=11;
  N=15;
  NN  = 500;
  MM  = 500;
  C=cell(MM, NN);
    out= cell(1, N);
    for n = 1:N
        for nn = 1:NN
            for mm = 1:MM
                C{mm, nn} = randn(M);
            end
        end
        out{n} = C;
    end
end

dpb le 5 Juin 2026

Modifié(e) : dpb le 5 Juin 2026

I wondered about fragmentation being an issue. I haven't moved from W10 to W11 w/ old hardware and had not heard that about it. That's ugly if one can't manage to achieve a block of contiguous memory if/when needed.

dpb le 6 Juin 2026

Modifié(e) : dpb le 7 Juin 2026

Ouvrir dans MATLAB Online

The coment above about W11 memory management got me curious so I asked an AI 'bot -- some most interesting responses--whether they're really right or not only testing will tell, but here's the conversation.

I started just asking about allocating a large block of contiguous memory to see what it said about that. My input is bold italics...

At the very end when I asked about W11 specifically and W10, MacOS and Linux apparently being unaffected, it added some additional information on W11 that just might be the crux of the matter.

Anyways, here's the whole session --

"Can an application acquire large contiguous memory in Windows 11 for computation speed such as large FFT?"

Yes, a Windows 11 application can acquire large contiguous blocks of virtual memory. However, the operating system's paged memory architecture means your block will likely be scattered across non-contiguous physical RAM. To optimize high-speed tasks like large Fast Fourier Transforms (FFT), this distinction determines the optimal approach. [1, 2, 3, 4, 5]

1. Contiguous Virtual Memory (The App Standard)

For user-mode applications, you don't need physically contiguous memory to get high computation speeds; virtually contiguous memory works just as well. [1]

How to Allocate: You can use standard C++ functions like new, malloc, or _aligned_malloc for memory alignment. [1]
Benefits for FFT: With a contiguous virtual address space, the CPU can calculate addresses via simple arithmetic, preventing cache-misses and maximizing the efficiency of the CPU's memory prefetcher. [1, 2, 3]
Best Practice: The Windows Virtual Memory Manager (VMM) might still swap pages to disk if your memory is under pressure. To keep your data resident, you can use the VirtualLock function to lock the memory in your working set. [1]

2. Contiguous Physical Memory

If you are writing a kernel-mode driver that interfaces directly with hardware (like a GPU or FPGA performing the FFT via Direct Memory Access), you can allocate physically contiguous memory. [1, 2, 3]

How to Allocate: You must use kernel-mode Windows APIs like MmAllocateContiguousMemory or MmAllocateContiguousMemorySpecifyCache.
The Catch: Because the OS constantly allocates and frees physical pages over time, RAM becomes heavily fragmented. Allocating massive amounts of contiguous physical memory becomes difficult and is highly likely to fail on a system that has been running for a while. [1, 2, 3, 4]

3. Hardware Acceleration (GPUs)

For massive multi-dimensional FFTs (often spanning gigabytes), the industry standard is to utilize hardware acceleration rather than stressing the CPU's RAM. [1]

How to Allocate: Use dedicated compute libraries like NVIDIA's cuFFT (via CUDA).
Best Practice: Applications use cudaHostAlloc or similar device APIs to reserve aligned host memory for rapid transfers to the GPU. [1, 2]

4. Forcing Windows 11 "Large Pages"

If you are doing heavy mathematical computations entirely on the CPU, you can request 2MB Large Pages (instead of the standard 4KB pages) from Windows 11. This reduces Translation Lookaside Buffer (TLB) misses, resulting in a noticeable speedup for heavy processing like an FFT. [1]

How to Allocate: Use VirtualAlloc with the MEM_LARGE_PAGES flag (this requires enabling the "Lock Pages in Memory" privilege in Windows Security Policies). [1]

If you are developing a native Windows C++ application, I can help you:

Review implementation details for using Win32 API calls like VirtualAlloc with MEM_LARGE_PAGES.
Walk through configuring FFT libraries like FFTW to take advantage of these memory configurations. [1]

Let me know which specific route you want to take. [1]

20 sites

MmAllocateContiguousMemory function (wdm.h)The routine maps this block to a contiguous block of virtual memory in the system address space and returns the virtual address of...Microsoft Learn
Virtual memory clarification - allocation of large contiguous ...Your system will get a little bit better performance if the OS is able to allocate the pages in contiguous blocks of 2MB a piece [Stack Overflow
Allocating System-Space Memory - Windows drivers - Microsoft LearnDrivers can use only the kernel stack to pass small amounts of data to their internal routines. Some drivers have to allocate addi...Microsoft Learn

"This application is MATLAB which uses FFTW and is showing decreasing performance with multiple calls"

You said: this application is MATLAB which uses FFTW and is showing decreasing performance with multiple calls

When MATLAB degrades in performance over multiple calls to fft or fftn, the root cause is rarely the lack of contiguous memory. Instead, it is usually caused by FFTW wisdom accumulation (internal planning caches growing too large), variable array sizes forcing constant replanning, or denormal numbers/NaNs corrupting the floating-point pipeline. [1, 2, 3, 4, 5]

The most common causes of this specific behavior can be isolated and resolved using the following techniques.

1. Clear or Freeze FFTW Wisdom (Most Common Fix)

MATLAB uses an internal system called "Wisdom" to remember the fastest way to calculate an FFT of a specific size. If your script processes many arrays of varying or odd dimensions, FFTW continually allocates internal heap memory to store new execution plans. This can balloon memory usage and degrade lookup performance. [1, 2, 3, 4]

You can fix this by clearing the single-precision and double-precision wisdom caches periodically in your loop: [1]

matlab

% Clear the accumulated planning cache

fftw(

'swisdom', []);

fftw(

'dwisdom', []);

Use code with caution.

Alternatively, switch the planner to estimate. This forces FFTW to use a fixed, heuristic-based mathematical approach rather than spending time dynamically measuring your hardware's memory architecture for an optimal layout. [1, 2]

matlab

fftw('planner', 'estimate');

Use code with caution.

2. Force Array Sizes to Powers of Two (Zero-Padding)

If your input sizes are odd, prime numbers, or fluctuate constantly, FFTW cannot reuse plans and must rebuild them. [1, 2]

The Math: FFTW is exceptionally fast for sizes that are powers of two (\(2^{n}\)), or products of small primes (\(3, 5, 7\)).
The Fix: Pad your input matrix to the next highest power of two. This makes execution highly predictable and allows MATLAB to hit the same cached FFTW plans over and over. [1, 2]

matlab

% Compute the ideal next power of 2 for size optimization
n_optimal = nextpow2(length(x));
N = 2^n_optimal;
% Pass N directly into the fft function to auto-pad with zeros
X = fft(x, N);

3. Check for Data Contamination (NaNs, Infs, and Denormals)

If your loop progressively introduces invalid values into your dataset, performance will plunge. [1]

The Cause: When NaN (Not a Number), Inf, or denormal numbers (values extremely close to zero) slip into an array, the CPU drops out of its fast hardware processing mode. It handles these edge cases using microcode exceptions, slowing the calculation speed down by a factor of 10 to 100.
The Fix: Scan your input array right before the FFT call to ensure no bad data is leaking into the operation: [1]

matlab

if any(isnan(x), 'all') || any(isinf(x), 'all')
    error(
    'NaN or Inf detected before FFT calculation!');
end

4. Prevent In-Place Memory Allocation Fragmentation

If your loop structure overwrites variables without explicitly preallocating them, MATLAB may fragment your virtual address space over time. Ensure you are not dynamically growing matrices inside the iteration loop.

Bad: data(:, i) = fft(matrix(:, i)); (Where data grows dynamically).
Good: Preallocate the entire matrix ahead of time using zeros() to force MATLAB to request a single, contiguous block of virtual memory from Windows up front.

To narrow this down, let me know:

Are the input array sizes changing on every call, or do they stay exactly the same?
Are you executing this on a CPU or utilizing GPU architectures (via gpuArray)? [1]

9 sites

Speed Up Fast Fourier Transforms in Generated Standalone ...Your MATLAB code calls one of these functions: fft , fft2 , fftn , ifft , ifft2 , or ifftn . You generate standalone C/C++ code. Y...MathWorks
fftw - Define method for determining FFT algorithm - MATLABOptimal Fourier Transform Workflow. Open Live Script. The fftw function can improve the performance of Fourier transform functions...MathWorks
How to Do FFT in MATLABApr 28, 2022 — when you play data the strings oscillate over time in different repeating patterns that can produce different sounds analyzing the...4:42YouTube·MATLAB

"On CPU and performance still degrades after clearing wisdom. The FFT is using a power of 2 (2^25, precisely)"

You said: on CPU and performance still degrades after clearing wisdom. The FFT is using a power of 2 (2^25, precisely)

Since your input size is fixed to exactly \(2^{25}\) (which is a massive array size occupying either 256MB for single-precision or 512MB for double-precision per array) and performance degrades even after clearing wisdom, the issue is almost certainly a system resource bottleneck or memory pipeline stall rather than an algorithm planning issue. [1, 2, 3]

At (2^{25}, your data vastly exceeds the CPU's internal L1/L2/L3 hardware caches. The calculations are completely bounded by main system RAM bandwidth. Three distinct root causes usually trigger this degradation across multiple loop iterations, along with ways to solve them. [1, 2]

1. The Multi-Thread Accumulation Trap (Most Common)

For an array of size \(2^{25}\), MATLAB uses its internal multi-threading engine (Intel MKL/FFTW) to spawn threads across all available CPU physical cores. [1]

The Problem: In loops, if MATLAB finishes an iteration and immediately starts the next one, the OS thread scheduler doesn't have time to put worker threads to sleep properly. Threads "spin" waiting for work, causing severe thread contention, context switching overhead, and CPU cache thrashing. The processor runs out of memory bus bandwidth, stalling subsequent calls. [1]
The Fix: Explicitly restrict MATLAB's multi-threading pool or inject a fractional pause to allow the thread-pool manager to reset its queues

matlab

% Test if single-threading stabilizes the execution time
maxNumCompThreads(1); 
% If you must use multithreading, add a tiny pause at the end of the loop 
% to let the OS thread-scheduler clear the CPU pipeline
pause(0.005); 

2. Physical Memory "Page Fault" Accumulation

Even if your workspace variable has a permanent, fixed size, MATLAB may be reallocating the underlying memory page tables in Windows 11 if the variable is modified or re-assigned inside the loop.

The Problem: A \(2^{25}\) double-precision array takes 512MB. If you calculate X = fft(x); inside a loop, MATLAB has to constantly request and release large chunks of virtual memory. This forces the Windows 11 kernel to zero-out old memory pages and remap virtual-to-physical space, causing major page faults that degrade performance over time.
The Fix: Force MATLAB to reuse the exact same physical memory buffer using in-place assignment. Do not change the variable name, and overwrite the memory space directly:

matlab

% Preallocate exactly once outside the loop
X = zeros(1, 2^25); 
for i = 1:num_iterations
    % ... load or generate your data into X ...
    
    % Force an in-place FFT calculation on the same memory block
    X = fft(X); 
    
    % Avoid creating new variable copies during down-stream processing
    
end

3. Diverging Accumulation / Denormal Stalls

In some iterative algorithms (such as back-and-forth filtering or updating a signal based on its previous state), repeatedly calculating an FFT without proper scaling causes numbers to exponentially grow or shrink. [1]

The Problem: Because MATLAB's fft calculates an unnormalized discrete transform, if numbers shrink down close to zero, they hit the Denormal (subnormal) range. When CPUs process denormal floating-point numbers, they cannot use hardware execution units and drop down to microcode emulation, creating a 10x–100x performance drop. [1]
The Fix: Ensure your loop scales data appropriately, or forcefully disable subnormal processing on your CPU if you are using an Intel/AMD processor:

matlab

% Ensure data does not contain underflowing values right before the FFT
if max(abs(X(:))) < 1e-20
    X(:) = 0; % Force tiny numbers cleanly to actual zero
end

Step-by-Step Diagnostic Plan

To completely isolate the problem, run this diagnostic sequence inside your loop:

Monitor Memory Garbage Collection: Add pack or java.lang.System.gc(); to the end of your loop to check if forcing a garbage collection prevents the degradation.
Isolate Memory vs. Compute: Temporarily comment out the fft line but keep your array generation logic. If the slowdown stops, it is a threading/FFTW issue. If the slowdown persists, Windows is struggling with memory allocation. [1]
Switch to Single Precision: Change your input array to single(x). This cuts the memory bandwidth in half (from 512MB to 256MB), vastly easing the strain on the Windows 11 virtual memory manager.

Let me know the results of these adjustments:

Did switching MATLAB to single-thread mode via maxNumCompThreads(1) stop the performance degradation?
Are you overwriting the exact same variable name in-place inside the loop?

You said: "This seems to occur only on Windows 11, not Windows 10, Mac OS or Linux systems running same benchmark code"

Because this degradation happens exclusively on Windows 11—and not on Windows 10, macOS, or Linux—the problem is narrowed down to two structural changes unique to Windows 11: aggressive background thread throttling (Intel Thread Director) and Virtualization-Based Security (VBS) memory overhead. [1]

At (2^25) elements (512MB for doubles), your workload heavily pushes both the limits of CPU thread scheduling and memory bandwidth. Three Windows 11 specific mechanisms cause this exact loop-degradation behavior along with methods to eliminate them. [1]

1. The Windows 11 "Background" Throttling Trap (Intel/AMD Hybrid Architecture)

Windows 11 introduced a strict thread priority manager linked to the hardware Intel Thread Director (or AMD equivalents). [1]

The Problem: If MATLAB loses active window focus (even for a split second, or if you click on another app), or if the OS falsely flags the execution loop as a background process, Windows 11 forcefully demotes MATLAB's multi-threaded worker threads to Efficiency Cores (E-Cores) or places them in "Eco Mode". The next time the loop hits the fft call, the worker threads are stuck on weak cores, or the OS struggles to shift them back to Performance Cores (P-Cores). This causes cascading execution delays. [1, 2]
The Fix: Prevent Windows 11 from managing your CPU thread affinity.

Open Windows Settings > System > Power & Battery. Change the Power Mode to Best Performance.
Launch MATLAB and run your loop.
Open the Windows Task Manager, go to the Details tab, right-click MATLAB.exe, select Set Priority, and change it to Above Normal or High. [1]

2. Windows 11 Virtualization-Based Security (VBS / HVCI)

By default, Windows 11 enforces Virtualization-Based Security (VBS) and Hypervisor-Protected Code Integrity (HVCI), features usually disabled or un-enforced on Windows 10 upgraders.

The Problem: VBS runs Windows inside a thin hypervisor layer to isolate system memory. When MATLAB makes massive consecutive allocations or memory page table updates for a 512MB array, the memory mapping requests must pass through this hypervisor layer. Over repeated iterations, memory translation buffers get fragmented, leading to an inflation in memory access latency that does not happen on Linux or macOS. [1]
The Fix: Turn off VBS to check if it matches bare-metal Linux/Windows 10 performance:

Press the Windows Key, type Core Isolation, and open it.
Toggle Memory Integrity to Off.
Restart your computer and rerun your benchmark.

3. Windows 11 Dynamic Memory Compression & Aggressive Working Sets

Windows 11 handles memory management and page-file swapping much more aggressively than Windows 10 to keep system memory looking "free."

The Problem: When MATLAB finishes a loop iteration and briefly pauses computation to index data, Windows 11 immediately marks sections of that 512MB virtual workspace chunk as eligible for compression or swapping. On the next loop, MATLAB triggers hidden "hard" or "soft" page faults to reclaim that memory space, leading to a loop that slows down incrementally over time.
The Fix: Lock MATLAB's physical memory footprint. You can bypass the Windows 11 virtual page manager entirely by telling MATLAB to use standard system heap memory utilities natively via your script:matlab% Force MATLAB to actively keep its memory page clean during long loops

% Put this snippet at the very end of your iteration loop
System.gc; 

The 60-Second Test to Prove it's Windows 11 Thread Scheduling

To definitively prove if Windows 11 is mismanaging your CPU cores, force MATLAB's underlying MKL thread pool to map strictly to your physical Performance Cores. Execute this in your command window before starting the benchmark loop:

matlab

% Force the Intel MKL engine inside MATLAB to disable thread-affinity changes
setenv('MKL_DYNAMIC', 'FALSE');
% Restrict the threads strictly to physical execution units 
% (e.g., if you have 8 Performance Cores, set this to 8)
maxNumCompThreads(8); 

If you apply Best Performance Mode and lock your thread count, let me know if the execution speed finally holds steady across your loop iterations. [1]

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Chris Turnes le 19 Juil 2026 à 5:18

Ouvrir dans MATLAB Online

1 vote

As suggested in other comments, what you're observing appears to be the result of memory pressure from highly fragmented memory allocations. I can reproduce this on a laptop with 32 GB RAM running Windows 11 and MATLAB R2026a, getting similar escalating timings for fft when the temporary array hasn't been cleared out.

However, if I change the allocation of the out cell array to instead store those individual 21 x 21 matrix blocks into one large 21 x 21 x MM x NN x 15 numeric array (this is completely equivalent in terms of how many total coefficients are stored, though it doesn't include the cell storage overhead), that escalation completely goes away. For that case, I see:

>> fft_repro
Cleared workspace time: 0.20915
Uncleared workspace time: 0.24398
Cleared workspace time: 0.18521
Uncleared workspace time: 0.24664
Cleared workspace time: 0.21819
Uncleared workspace time: 0.24558
Cleared workspace time: 0.21581
Uncleared workspace time: 0.23391

Admittedly I haven't gone so far as to attach a profiler and look at what the hardware counters are reporting, but this is very strong evidence that memory fragmentation and the system memory manager are the cause of what you're seeing.

The explanation is that the code that stores out as a cell array of cell arrays is creating many, many (~3.75 million) small memory buffers (~3.5 KB each) and those can be located anywhere the OS chooses within the heap memory. As a result, they're probably scattered all throughout to some extent, which deeply fragments the heap. When FFT generates the output and the temporary buffer it needs, it's attempting to get 0.5-1 GB (up to two arrays of 2^25 complex double values) that the OS has to satisfy from a virtual address space that's likely full of holes.

Even after clearing, the OS-level state (things like growth of the pagefile, etc.) seemingly get worse on each iteration, suggesting that the fragmentation causes additional bookkeeping for the OS memory manager that is accumulating over time.

I also ran the code on another machine with significantly more RAM (128 GB) and don't observe the escalating behavior on either form of generating out. This is because on that machine even with the significant fragmentation it is still quite trivial for the OS to find 1 GB of free, contiguous memory to provide.

If it's necessary to have that ~13 GB of memory consumed by out around as you compute your FFTs, I'd suggest trying to see if you can store it in a single compact numeric array. If that's not possible for one reason or another, perhaps you could compromise with something like a 15-element cell array each of which had 21 x 21 x MM x NN blocks, etc. The larger you can make the temporary blocks you store, the more it will help.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

FFT slowdown even after workspace reset

31 commentaires
Afficher 29 commentaires plus anciens Masquer 29 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Community Treasure Hunt

FFT slowdown even after workspace reset

31 commentaires Afficher 29 commentaires plus anciens Masquer 29 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

31 commentaires
Afficher 29 commentaires plus anciens Masquer 29 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens