Faster Empirical Cumulative Distribution Function (ECDF)

Question

Alex le 9 Fév 2025

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2173836-faster-empirical-cumulative-distribution-function-ecdf

Commenté : Mike Croucher le 16 Fév 2025

Réponse acceptée : Mike Croucher

Ouvrir dans MATLAB Online

Hello everyone!

Im trying to speed up my bivariate ecdf code. Suppose we have data points and points for ecdf calculation:

data = rand(10000,2);
u = rand(1000,2);

Then to calculate bivariate ecdf i use code:

ECDF = squeeze(sum(all(permute(bsxfun(@le, data, permute(u, [3,2,1])), [2,1,3])))) / n;

How can i speed up my code? Dramatically, if possible. I've tried this but with no luck:

[u1,pos1] = sort(u(:,1)); [~,pos1] = sort(pos1);
[u2,pos2] = sort(u(:,2)); [~,pos2] = sort(pos2);
h = histcounts2(data(:,1),data(:,2),[0;u1],[0;u2],'Normalization','cdf');
ECDF = h(Sub2Ind([1000,1000],[pos1,pos2]));

2 commentaires
Afficher AucuneMasquer Aucune

Walter Roberson le 9 Fév 2025

Ouvrir dans MATLAB Online

You would have to time it to be sure, but possibly

ECDF = squeeze(sum(all(permute(data <= permute(u, [3,2,1]), [2,1,3])))) / n;

might be faster.

There are some situations in which bsxfun is faster, but there are also situations in which implicit expansion is notably faster.

Alex le 9 Fév 2025

Hello.

I tested it out, speed is the same.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Mike Croucher le 14 Fév 2025

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2173836-faster-empirical-cumulative-distribution-function-ecdf#answer_1559803

Ouvrir dans MATLAB Online

In your original code you didn't define n so I might have got the definition wrong here but since its just a scalar it won't affect time at all. It seems that using a loop is fater here but the amount of speed-up depends on the data-size quite a lot. Using your original numbersthe loop is slightly faster

n = 10000;
m = 1000;
data = rand(n,2);
u = rand(m,2);
tic
ECDF = squeeze(sum(all(permute(bsxfun(@le, data, permute(u, [3,2,1])), [2,1,3])))) / n;
original = toc;
tic
ECDF1 = squeeze(sum(all(permute(data <= permute(u, [3,2,1]), [2,1,3])))) / n;
Roberson = toc;
tic
ECDF3 = zeros(m, 1);  
for i = 1:m
    validPoints = (data(:,1) <= u(i,1)) & (data(:,2) <= u(i,2));
    count = sum(validPoints); 
    ECDF3(i) = count / n;
end
loopy = toc;
fprintf("Your original takes %f seconds\n",original)
Your original takes 0.028602 seconds
fprintf("Roberson's version takes %f seconds\n",Roberson);
Roberson's version takes 0.026836 seconds
fprintf("Loopy version takes %f seconds\n",loopy)
Loopy version takes 0.017854 seconds
fprintf("Loopy version is %f times faster than the original\n",original/loopy)
Loopy version is 1.601994 times faster than the original
fprintf("Do loopy version and original version give same result?\n")
Do loopy version and original version give same result?
all(ECDF ==ECDF3)
ans = logical
   1

Increasing the size of both n and m by 10x gives a much better speed-up for the loopy version

n = 100000;
m = 10000;
data = rand(n,2);
u = rand(m,2);
tic
ECDF = squeeze(sum(all(permute(bsxfun(@le, data, permute(u, [3,2,1])), [2,1,3])))) / n;
original = toc;
tic
ECDF1 = squeeze(sum(all(permute(data <= permute(u, [3,2,1]), [2,1,3])))) / n;
Roberson = toc;
tic
ECDF3 = zeros(m, 1);  
for i = 1:m
    validPoints = (data(:,1) <= u(i,1)) & (data(:,2) <= u(i,2));
    count = sum(validPoints); 
    ECDF3(i) = count / n;
end
loopy = toc;
fprintf("Your original takes %f seconds\n",original)
Your original takes 2.133713 seconds
fprintf("Roberson's version takes %f seconds\n",Roberson);
Roberson's version takes 2.179682 seconds
fprintf("Loopy version takes %f seconds\n",loopy)
Loopy version takes 0.494672 seconds
fprintf("Loopy version is %f times faster than the original\n",original/loopy)
Loopy version is 4.313389 times faster than the original
fprintf("Do loopy version and original version give same result?\n")
Do loopy version and original version give same result?
all(ECDF ==ECDF3)
ans = logical
   1

parfor can help for large enough m and n.

On my 4 year old 8 core machine I found that for large m and n I can get even more speed-up using parfor on the loop i=1:m and a threadpool. I.e. parpool('threads'). Using m=100000 and n=10000 as above, for example, I get the following times without parfor

Your original takes 1.213787 seconds

Roberson's version takes 1.332038 seconds

Loopy version takes 0.368781 seconds

Loopy version is 3.291346 times faster than the original

and with parfor:

Your original takes 1.184120 seconds

Roberson's version takes 1.402003 seconds

Loopy version takes 0.224152 seconds

Loopy version is 5.282667 times faster than the original

For smaller values of m and n, parfor may be slower than for and this will be machine dependent.

Hope this helps

2 commentaires
Afficher AucuneMasquer Aucune

Alex le 15 Fév 2025

Modifié(e) : Alex le 15 Fév 2025

Thank you.

Its a little bit counterintuitive, every guilde says one must vectorize, but in this case looping works better.

Mike Croucher le 16 Fév 2025

My solution is a combination of loops and vectorisation. The advice 'vectorise at all costs' is rather outdated. MATLAB's Just In Time compiler is much better than it used to be so you are not 'punished' for writing loops. Which approach is best is very problem dependent.

I think your fully vectorised solution is slower here because it has to create some large intermediate arrays and the computation on them is rather light. So, most of the time you were just moving things around in memory

Connectez-vous pour commenter.

Faster Empirical Cumulative Distribution Function (ECDF)

2 commentaires
Afficher AucuneMasquer Aucune

Réponse acceptée

2 commentaires
Afficher AucuneMasquer Aucune

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

Faster Empirical Cumulative Distribution Function (ECDF)

2 commentaires Afficher AucuneMasquer Aucune

Réponse acceptée

2 commentaires Afficher AucuneMasquer Aucune

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

2 commentaires
Afficher AucuneMasquer Aucune

2 commentaires
Afficher AucuneMasquer Aucune