Memory optimization and code speed up

Question

1 vote

Hi! I have a function to compute what is called Fuzzy Rand Index. It is a way to compare a fuzzy (soft) or probabilistic data partition (clustering result) with a reference hard partition. The math behind this is basically an algebra with t-norms and t-conorms (s-norms). Among some choices, we can use min() as t-norm and max() as s-norm or algebraic product and probabilistic OR (probor) also as t-norm and s-norm. In the present case I'm using min() and max().

The input data are 2 partition matrices (Nxd) in which N is the number of data points and d the number of clusters. So each element h_ji of a partition matrix is precisely the probability of datapoint j belongs to cluster i. In general, N >> d (e.g. N = 20000 to 50000 points and d is tipically 2 to 5 clusters).

The math behind the index is such that I have to compute sort of a "pairwise min" between each data point and all others within the same column (min(h_ji,h_ki)), so I'll have N*(N-1)/2 values for each of the d columns. To these d arrays of size N*(N-1)/2, I must apply the max s-norm element wise to end up with a quantity V that is an array of size N*(N-1)/2.

Also I must compute the same "pairwise min" t-norm between every data point and each other but mixing the columns like min(h_ji,h_kp). So I'll end up with factorial(d)/factorial(d-2) arrays of size N*(N-1)/2 and to these I must also apply the max s-norm element wise to end up with a quantity X that is an array of size N*(N-1)/2.

What I call "pairwise min" is much like pdist do, comparing each point with the next ones, but not with the previous ones.

Later, this V and X from one partition matrix are compared with another V0 and X0 from the reference partition matrix with min t-norm.

I have implemented this in 3 different ways.

The slower and the one that has high memory consumption is the more compact form as:

% you can use as an example U = rand(20000,4);
U = rand(30000,4); % just for test
[N,d] = size(U); % U is a partition matrix
comb = ones(d,2).*(1:d)';
V = max(cell2mat(arrayfun(@(k) cell2mat(arrayfun(@(j) pdist2(U((j+1):N,comb(k,1)),U(j,comb(k,2)),@min),(1:(N-1)).','UniformOutput',false)),1:size(comb,1),'UniformOutput',false)).');
comb = nchoosek(1:d,2);
comb = [comb;fliplr(comb)];
X = max(cell2mat(arrayfun(@(k) cell2mat(arrayfun(@(j) pdist2(U((j+1):N,comb(k,1)),U(j,comb(k,2)),@min),(1:(N-1)).','UniformOutput',false)),1:size(comb,1),'UniformOutput',false)).');

In some intermediate form, I've tried to use parallel code like

% you can use as an example U = rand(20000,4);
U = rand(30000,4); % just for test
[N,d] = size(U); % U is a partition matrix
comb = nchoosek(1:d,2);
ncomb = size(comb,1);
X = cell(n-1,1);
V = cell(n-1,1);
parfor j=1:n-1
    tmp = zeros(n-j,1);
    for k=1:ncomb
        tmp = max(tmp,max(min(U(j,comb(k,1)), U((j+1):end,comb(k,2))) , min(U(j,comb(k,2)), U((j+1):end,comb(k,1)))));
    end
    X{j} = tmp;
    tmp = zeros(n-j,1);
    for k=1:d
        tmp = max(tmp,min(U(j,k), U((j+1):end,k)));
    end
    V{j} = tmp;
end
X2 = cell2mat(X);
V2 = cell2mat(V);

Finally, the version I've got the best results in terms of speed and memory usage is the following, where I did the trick using circshifts and have used single type, as double precision is not important to me:

% you can use as an example A = rand(20000,4);
A = rand(30000,4,'single'); % just for test
[N,d] = size(A); % A is a partition matrix
NC = fix(N/2);
numelements = N*(N-1)/2;
V = zeros(numelements,1,'single');
X = zeros(numelements,1,'single');
temp = zeros(N,1,'single');
for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    V((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2);
    X((nc1-1)*N + 1 : nc1*N) = func1(A,Ac,temp,d);  
    temp(:) = 0;
end
if N/2 - NC == 0
    V((NC-1)*N + 1 : end) = max(min(A(1:N/2,:),A(N/2+1:end,:)),[],2);
    X((NC-1)*N + 1 : end) = func2(A,N,d,temp(1:N/2));
else
    Ac = circshift(A,-NC,1);
    V((NC-1)*N + 1 : end) = max(min(A,Ac),[],2);
    X((NC-1)*N + 1 : end) = func1(A,Ac,N,d);
end
function temp = func1(A,Ac,temp,d)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A,circshift(Ac,-nc2,2)),[],2));
    end
end
function temp = func2(A,N,d,temp)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A(1:N/2,:),circshift(A(N/2+1:end,:),-nc2,2)),[],2));
    end
end

The problem is that this calculation must be made several times to compare a lot of different clustering results to the reference one and for datasets with N > 20000 things take a lot of time. So do you think there is room for more optimization and get this faster and more memory efficient than the 3rd version with the circshifts?

Also, as this is done multiple times (2300 times for each dataset), this function is being called inside a parfor. So each thread executes this function. But min and max are multi-threaded functions also. My approach was to set the parpool with 6 workers (12 threads) and left 2 workers (4 threads) out of the parpool willing that these 4 threads will be used for the min() and max() multi-threading. Is that right? I thought that using all workers within the parpool would disable the multi-threading of min() and max() and get things slower.

10 commentaires
Afficher 8 commentaires plus anciens Masquer 8 commentaires plus anciens

Lorenco Santos Vasconcelos le 10 Jan 2025

Modifié(e) : Lorenco Santos Vasconcelos le 22 Jan 2025

Ouvrir dans MATLAB Online

Here in function form

% you can use as an example A = rand(20000,4);
A = rand(30000,4,'single'); % just for test
tic
FRI(A);
toc
Elapsed time is 13.528848 seconds.
function FRI(A)
[N,d] = size(A); % A is a partition matrix
NC = fix(N/2);
numelements = N*(N-1)/2;
V = zeros(numelements,1,'single');
X = zeros(numelements,1,'single');
temp = zeros(N,1,'single');
for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    V((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2);
    X((nc1-1)*N + 1 : nc1*N) = func1(A,Ac,temp,d);  
    temp(:) = 0;
end
if N/2 - NC == 0
    V((NC-1)*N + 1 : end) = max(min(A(1:N/2,:),A(N/2+1:end,:)),[],2);
    X((NC-1)*N + 1 : end) = func2(A,N,d,temp(1:N/2));
else
    Ac = circshift(A,-NC,1);
    V((NC-1)*N + 1 : end) = max(min(A,Ac),[],2);
    X((NC-1)*N + 1 : end) = func1(A,Ac,N,d);
end
end
function temp = func1(A,Ac,temp,d)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A,circshift(Ac,-nc2,2)),[],2));
    end
end
function temp = func2(A,N,d,temp)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A(1:N/2,:),circshift(A(N/2+1:end,:),-nc2,2)),[],2));
    end
end

Mike Croucher le 17 Jan 2025

@Lorenco Santos Vasconcelos Yes, it runs. Thank you. I spent some time trying to make it faster but failed. I also tried using a GPU and it didn't work well.

I think that your strategy of paralleising at the level above this function is the right approach. Threads on parpools are complicated things and the behaviour of the number of threads per worker depends on if you have a process pool or a thread pool.

You said this "My approach was to set the parpool with 6 workers (12 threads) and left 2 workers (4 threads) out of the parpool willing that these 4 threads will be used for the min() and max() multi-threading. Is that right? I thought that using all workers within the parpool would disable the multi-threading of min() and max() and get things slower."

This is not correct. I'll write up how it works in a blog post soon.

What I suggest you do is take an empirical approach. You say you have 2300 iteratons of this per dataset. I would set up a version that works through 100-200 iterations and simply time it. First using no pool, then a pool of 2 workers, 4 workers and so on. Whatever is fastest is your sweet spot.

If you work at an institution that has a High Performance Computing cluster, even better.

It's not always the case that using all available cores results in the fastest execution and there can be many reasons for this. Its often algorithm and hardware dependent.

Lorenco Santos Vasconcelos le 22 Jan 2025

Ouvrir dans MATLAB Online

Hi @Mike Croucher! Thank you very much for your comments.

Yes, after I realized that what I've said: "My approach was to set the parpool with 6 workers (12 threads) and left 2 workers (4 threads) out of the parpool willing that these 4 threads will be used for the min() and max() multi-threading. Is that right? I thought that using all workers within the parpool would disable the multi-threading of min() and max() and get things slower." is not the right way to do this.

Then I thought the following: start a parpool with Processes profile but controlling the number of threads used by each process with

pctRunOnAll maxNumCompThreads(nThreads)

In this way, I could use for example 2 or 4 threads inside each worker so these threads would be used for the max and min built-in multithreading. Example: 4 processes with 4 threads available to each on using maxNumCompThreads(4). In this way, each process would do part of the parfor iterations (2300) and each iteration would count with 4 threads to speed up the min and max functions, right?

I've tried several combinations like 8 processes with 2 threads each, 4 processes with 4 threads and so on. Surprisingly, none of this was better than parpool('threads',8). My machine is 8 physical cores Intel Xeon W-11955M with 16 logic cores.

When I run with parpool('threads',8), i.e. using all physical cores, does min and max can use the builtin multithreading or they become sequential?

I think my main doubt is: as my time consuming function uses a lot of min and max, there is a way to run this function inside a parfor (to perform iterations in parallel) and also ensure that min and max uses multithreading? This configuration, if possible, would speed up things or just using the parfor with maximum available hardware and forget about min and max multithreading will be the best way to go?

I could not think any code changes to speed up the FRI function

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Chris Turnes le 22 Jan 2025

Modifié(e) : Chris Turnes le 22 Jan 2025

Ouvrir dans MATLAB Online

3 votes

I think it will be somewhat hard to make something drastically faster than what you have, mostly because of the sheer amount of computation. The non-linear nature of the combined min-max means that there is a lot of computation that can't really be avoided. That said, a lot of the effort is going toward building the array X, and I think there is an opportunity to cut it down a good bit.

For the computation of any given value of X, we're looking at a comparison of some row (call it j) and all shifts of another row (call it k) except the 0-shift; i.e., we're circularly shifting around the second row but ignoring its original permutation. For a row k, call the maximum value across the row Mk and the second-to-maximum (penultimate maximum) Pk. Then for any pair of rows, if we look at them when both are in the original ordering, then we have one of two scenarios:

If Mj and Mk occur in different columns, then the value of X for that pair is min(Mj, Mk) (the minimum of the two row maximum). You can reason your way to this by arguing that you need to find a pair of row values that's going to have the largest minimum, and so the largest possible minimum must be the maximum of the row that has the lower maximum.
If Mj and Mk occur in the same column, then by similar logic the value of X for that pair must be max(min(Mj, Pk), min(Pj, Mk)).

This means to compute X, we only have to keep track of the maxima and penultimate maxima of each row, and we don't actually have to compute any explicit column shifts. What is particularly nice about this is that it makes the computational cost of getting X independent of the actual value of d (though you can't say the same for V, but I don't see a way around that).

Here's a version of the code that does that, with some comments that hopefully explain the logic well along the way. I also included a slightly modified version of your original code so we can verify it's giving the same results (I incorporated Jeremy's points about single precision).

%% Set up.
N = 30000;
d = 4;
A = rand(N,d,'single');
[N,d] = size(A); % A is a partition matrix
NC = fix(N/2);
numelements = N*(N-1)/2;
%% Circshifted version from the post:
tic;
V = zeros(numelements,1,"like",A);
X = zeros(numelements,1,"like",A);
temp = zeros(N,1,"like", A);
for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    V((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2);
    X((nc1-1)*N + 1 : nc1*N) = func1(A,Ac,temp,d);  
    % temp is never assigned, so it doesn't need to be re-initialized
    %temp(:) = 0;
end
if N/2 - NC == 0
    V((NC-1)*N + 1 : end) = max(min(A(1:N/2,:),A(N/2+1:end,:)),[],2);
    X((NC-1)*N + 1 : end) = func2(A,N,d,temp(1:N/2));
else
    Ac = circshift(A,-NC,1);
    V((NC-1)*N + 1 : end) = max(min(A,Ac),[],2);
    X((NC-1)*N + 1 : end) = func1(A,Ac,N,d);
end
toc
Elapsed time is 10.231751 seconds.
%% A version using sorting to reduce computation for X.
tic;
V2 = zeros(numelements,1,'single');
X2 = zeros(numelements,1,'single');
% Get the sort order of each *row* of A.
[As,mcol] = sort(A, 2);
% Only bother keeping track of the maxima and penultimate maxima.
As = As(:, (end-1):end);
% Flip the positions of the maxima and penultimate maxima to help
% us make more efficient comparisons.
Asc = As(:, [2 1]);
% Keep around the indices telling us which row contributed the maxima.
mcol = mcol(:,end);
mcols = mcol;
for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    % Compute V as before:
    V2((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2);
    % Compute X:
    Asc = circshift(Asc, -1, 1);
    mcols = circshift(mcols, -1, 1);
    % temp1 holds max(min(Mk,Pj), min(Pk,Mj))
    temp1 = max(min(As, Asc), [], 2);
    % temp2 holds max(min(Mk,Mj))
    temp2 = min(As(:,2), Asc(:,1));
    % Find which column should use temp1 vs temp2
    mask = mcol == mcols;
    X2((nc1-1)*N + 1 : nc1*N) = temp1.*mask + temp2.*~mask;
end
% I ignored the last iteration for simplicity, but you could surely take the same
% approach.  It's a small fraction of the runtime so it shouldn't matter.
if N/2 - NC == 0
    temp = zeros(N, 1, 'single');
    V2((NC-1)*N + 1 : end) = max(min(A(1:N/2,:),A(N/2+1:end,:)),[],2);
    X2((NC-1)*N + 1 : end) = func2(A,N,d,temp(1:N/2));
else
    Ac = circshift(A,-NC,1);
    V2((NC-1)*N + 1 : end) = max(min(A,Ac),[],2);
    X2((NC-1)*N + 1 : end) = func1(A,Ac,N,d);
end
toc;
Elapsed time is 5.435947 seconds.
% Make sure everything lines up.
assert(isequal(V, V2));
assert(isequal(X, X2));
%% Helper functions
function temp = func1(A,Ac,temp,d)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A,circshift(Ac,-nc2,2)),[],2));
    end
end
function temp = func2(A,N,d,temp)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A(1:N/2,:),circshift(A(N/2+1:end,:),-nc2,2)),[],2));
    end
end

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Jeremy Hughes le 22 Jan 2025

Modifié(e) : Jeremy Hughes le 22 Jan 2025

Ouvrir dans MATLAB Online

I like the sorting @Chris Turnes.

I was trying to make this part more efficient:

for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    V((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2)
end

My approach was to vectorize this and make use of implicit expansion, but that will allocate

, so any way to avoid that would probably be more efficient, also, that will cause problems as this scales. That would also mean looping over columns. The same is true about circshift though, you're creating N, NxM matrices.

The current algorithm is

, and with the sort, you can probably get that down to

.

If optimizing this is crucial for the workflow, consider writing parts with MEX. (Chris and I talked about this offline.) Allocation and memory access are at odds here, and MATLAB optimizes MIN and MAX for general purposes, but that might not be optimal for the composition of the operations in this case. That might not be trivial to optimize either though, so if @Chris Turnes's solution is sufficient, I'd use that.

Thanks @Lorenco Santos Vasconcelos for the interesting question!

Lorenco Santos Vasconcelos le 22 Jan 2025

Ouvrir dans MATLAB Online

Hi @Jeremy Hughes! In fact I've also tried several ways to optimize

for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    V((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2)
end

but also without success.

Have tried pre computing the shifts with

numDimsA = ndims(A);
[I, J] = meshgrid(-(1:NC-1), -(0:d-1));
M = [I(:), J(:)];
idx = cell(size(M,1), numDimsA);
for j=1:size(M,1)
    p = M(j,:);
    for k = 1:numDimsA
        if p(k) == 0
            idx{j,k} = ':';
        else
            m      = size(A,k);
            idx{j,k} = mod((0:m-1)-double(rem(p(k),m)), m)+1;
        end
    end
end
nc1 = 1;
k = 1;
while nc1 <= size(idx,1) - (d - 1)
    Ac = A(idx{nc1,:});
    V((k-1)*N + 1 : k*N) = max(min(A,Ac),[],2);
    nc1 = nc1 + d;
    k = k+1;
end

but it won't help

The sorting solution from @Chris Turnes is extremely nice and elegant, thank you very much @Chris Turnes. It improved the performance speed almost by 2x.

It is hard to stress that there is no more room for optimization, but after @Chris Turnes's approach I think we are reaching some kind of limit that could be slightly passed only using MEX as you've said. Maybe using inplace operations in MEX could help too.

I'm not an expert in MEX neither used to it. What parts would you suggest to write using MEX? Bytheway, I think the present solution is enough.

Chris Turnes le 22 Jan 2025

Modifié(e) : Chris Turnes le 27 Jan 2025

Well as a caveat, I think if you're going to try to use MEX to speed this up you're going to want to make sure you have some kind of threading framework to place it in. The current MATLAB implementation is heavily multithreaded, so it's likely if you went with a plain serial MEX implementation you'd be a good bit slower.

I think the main potential gains come from two places:

No explicit copies would be necessary. Rather than circularly shifting, you can just keep track of the pointer offsets and array quantities.
You wouldn't need to compute intermediate results like min(A,Ac) as a full array, but rather wrap them into a single calculation. This also means for individual rows, you could use the maxima+penultimate maxima trick to skip computation in V as well when possible.

Lorenco Santos Vasconcelos le 27 Jan 2025

Yes, I think keeping things only in MATLAB is better for now because of the multithreading. Maybe using mex will require considerable effort for a small gain. At this point, I think optimizing the computation V is what will help a little more, but it is more intricate. I will try to come up with something. If I could get anything faster than the actual max(min(A,Ac),[],2); I let you know.

Connectez-vous pour commenter.

Answer 2

Jeremy Hughes le 21 Jan 2025

Ouvrir dans MATLAB Online

0 votes

Following on the comments above:

"where I did the trick using circshifts and have used single type" - @Lorenco Santos Vasconcelos

I checked, and using single doesn't appear to improve the performance unless you use single everywhere; Since your input is double, mixing with single doesn't actually do what you expect. Likely it's being upcast internaly.

But if you use single throughout; then I see a different (from about ~13 seconds on my machine down to ~9). That might not be what you want if you need to preserve the input precision, but if you really don't care about that level of precsision, cast A to single, and get a 25% speedup without changing the code. This is my version, removing temporaries and making this agnostic to the type of A.

function [V,X] = repro_func(A)
[N,~] = size(A); % A is a partition matrix
NC = fix(N/2);
numelements = N*(N-1)/2;
V = zeros(numelements,1,"like",A);
X = zeros(numelements,1,"like",A);
for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    V((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2);
    X((nc1-1)*N + 1 : nc1*N) = minmax1(A,Ac);
end
if N/2 - NC == 0
    V((NC-1)*N + 1 : end) = max(min(A(1:N/2,:),A(N/2+1:end,:)),[],2);
    X((NC-1)*N + 1 : end) = minmax2(A);
else
    Ac = circshift(A,-NC,1);
    V((NC-1)*N + 1 : end) = max(min(A,Ac),[],2);
    X((NC-1)*N + 1 : end) = minmax1(A);
end
end
function res = minmax1(A,Ac)
sz = size(A);
res = zeros(sz(1),1,"like",A);
for nc2 = 1:width(A)-1
    res = max(res,max(min(A,circshift(Ac,-nc2,2)),[],2));
end
end
function res = minmax2(A)
sz = size(A);
res = zeros(sz(1)/2,1,"like",A);
for nc2 = 1:width(A)-1
    res = max(res,max(min(A(1:sz(1)/2,:),circshift(A(sz(1)/2+1:end,:),-nc2,2)),[],2));
end
end

Using circshift is letting you take advantage of vectorization in min at the cost of allocating more data, but I don't think it's the most optimal way to do what you want.

Probably you want to write out the nested loops, and vectorize the inner most call to get the most speed.

I don't think this is doing the same thing you want, but it's calculating the minimum between each element and each the following elements, then finding the max of those mins. It's much faster and visiting the same set of points:

function M = pairminmax(A)
    arguments
        A(:,1) {mustBeNumeric}
    end
    M = zeros(numel(A),1,'like',A);
    for ii = 1:numel(A) - 1
        M(ii) = max(min(A(ii),A((ii+1):numel(A))));
    end
end

And this is the naive version which just does singleton expansions (and twice as many comparisons) which is only slightly slower than the one above.

max(min(A(:),A(:)'),[],2)

If I had the expressions written out, I could probably craft something specific to this case, but I hope this gives you enough to go on to write a faster loop.

2 commentaires
Afficher Aucune Masquer Aucune

Lorenco Santos Vasconcelos le 21 Jan 2025

Hi @Jeremy Hughes! Thank you for taking a look into this.

You're correct about the single type comment. It only makes sense when A is already single type. I forgot to change A = rand(30000,4); to A = rand(30000,4,'single'); when creating the test matrix.

I'll test your pairminmax function to see the impact and come back to tell you.

I'm sending here the expressions so you can analyze further and maybe come with a better (faster) idea:

Here are the definitions of the terms:

:Fuzzy set of pairs of data objects belonging to the i-th class in A. The membership degree of each pair of objects

is given by the truth value of the proposition ‘‘object

belongs to the i-th class AND object

belongs to the i-th class’’, that is,

, where ‘‘t’’ is a triangular norm (t-norm, such as min or algebraic product) used as a conjunction to implement the connective ‘‘AND’’ of the proposition.

: Fuzzy set of pairs of data objects belonging to different classes

and

in A. The membership degree of each pair of objects

is given by the truth value of the proposition ‘‘object

belongs to the

-th class AND object

belongs to the

-th class’’, that is,

.

V: Fuzzy set of pairs of data objects belonging to the same class in A. The membership degree of each pair of objects

is given by the truth value of the proposition ‘‘(object

belongs to the 1st class AND object

belongs to the 1st class) OR (object

belongs to the 2nd class AND object

belongs to the 2nd class) OR … OR (object

belongs to the k-th class and object

belongs to the k-th class)’’, that is:

where ‘‘s’’ is a triangular co-norm (s-norm, e.g., max or probabilistic or) used as a disjunction to implement the connective ‘‘OR’’ of the proposition.

X: Fuzzy set of pairs of data objects belonging to different classes in A. The membership degree of each pair of objects

is given by the truth value of the disjunction (connective “OR”) of the propositions ‘‘(object

belongs to the

-th class AND object

belongs to the

-th class) for

, that is:

where ‘‘s’’ is a triangular co-norm (s-norm, e.g., max or probabilistic or) used as a disjunction to implement the connective ‘‘OR’’ of the proposition.

When using the t-norm as min and s-norm as max, the expressions are:

and

Lorenco Santos Vasconcelos le 22 Jan 2025

Ouvrir dans MATLAB Online

Hi @Jeremy Hughes!

Yes, your pairminmax function does not do the same thing I need.

Matrix A has N rows (number of datapoints) by d columns (number of clusters). It is a fuzzy partition matrix.

What I need is to compute the pairwise minimum on each column and then compute max of each row.

The first part is correctly computed by

for ii = 1:numel(A) - 1
    min(A(ii),A((ii+1):numel(A)));
end

But this needs to be done in each column of A.

This will result in a N*(N-1)/2 rows matrix with d columns

Then I compute the max of each row

I've written with only nested loops in the most basic way (FRI2 function) and the performance is worse:

A = rand(30000,4,'single'); % just for test
tic
FRI(A);
toc
tic
FRI2(A);
toc
function [V,X] = FRI(A)
[N,d] = size(A); % A is a partition matrix
NC = fix(N/2);
numelements = N*(N-1)/2;
V = zeros(numelements,1,'single');
X = zeros(numelements,1,'single');
temp = zeros(N,1,'single');
for nc1 = 1:NC-1
    Ac = circshift(A,-nc1,1);
    V((nc1-1)*N + 1 : nc1*N) = max(min(A,Ac),[],2);
    X((nc1-1)*N + 1 : nc1*N) = func1(A,Ac,temp,d);  
    temp(:) = 0;
end
if N/2 - NC == 0
    V((NC-1)*N + 1 : end) = max(min(A(1:N/2,:),A(N/2+1:end,:)),[],2);
    X((NC-1)*N + 1 : end) = func2(A,N,d,temp(1:N/2));
else
    Ac = circshift(A,-NC,1);
    V((NC-1)*N + 1 : end) = max(min(A,Ac),[],2);
    X((NC-1)*N + 1 : end) = func1(A,Ac,N,d);
end
end
function temp = func1(A,Ac,temp,d)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A,circshift(Ac,-nc2,2)),[],2));
    end
end
function temp = func2(A,N,d,temp)
    for nc2 = 1:d-1
        temp = max(temp,max(min(A(1:N/2,:),circshift(A(N/2+1:end,:),-nc2,2)),[],2));
    end
end
function [V,X] = FRI2(A)
[N,d] = size(A);
numelements = N*(N-1)/2;
V = zeros(numelements,1,'single');
X = zeros(numelements,1,'single');
k = 0;
for ii = 1:N-1
    V(k + 1 : k + N - ii) = max(min(A(ii,:),A((ii+1):N,:)),[],2);
    for jj = 1:d-1
        for kk = jj+1:d
            X(k + 1 : k + N - ii) = max(X(k + 1 : k + N - ii),max(min(A(ii,jj),A((ii+1):N,kk)),min(A(ii,kk),A((ii+1):N,jj))));
        end
    end
    k = k + N - ii;
end
end

Connectez-vous pour commenter.

Memory optimization and code speed up

10 commentaires
Afficher 8 commentaires plus anciens Masquer 8 commentaires plus anciens

Réponse acceptée

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Plus de réponses (1)

2 commentaires
Afficher Aucune Masquer Aucune

Catégories

Produits

Version

Tags

Community Treasure Hunt

Memory optimization and code speed up

10 commentaires Afficher 8 commentaires plus anciens Masquer 8 commentaires plus anciens

Réponse acceptée

5 commentaires Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Plus de réponses (1)

2 commentaires Afficher Aucune Masquer Aucune

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

10 commentaires
Afficher 8 commentaires plus anciens Masquer 8 commentaires plus anciens

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

2 commentaires
Afficher Aucune Masquer Aucune