Mean distance function upgrade question

4 vues (au cours des 30 derniers jours)
Chm
Chm le 31 Oct 2022
Commenté : Jan le 1 Nov 2022
Dear Team,
The below code calculating the mean distance. For a few thousand points (x,y,z) the code is working fine, but when i input values as group1 = 70000 points and group2 = 80000 points the progress is too slow. What should i add/change in the below code to have optimal results ?
data = table2array(readtable("test.xlsx"));
group1 = length(data(~isnan(data(:,1))));
group2 = length(data(~isnan(data(:,5))));
tic
for i=1:group1
display(i);
minval = inf;
for j=1:group2
point(i,j) = sqrt((data(j,5)-data(i,1))^2+(data(j,6)-data(i,2))^2+(data(j,7)-data(i,3))^2);
if point(i,j)<minval
minval = point(i,j);
end
end
values(i) = minval;
end
avg = mean(values);
toc
Thanks in advance

Réponse acceptée

Chm
Chm le 31 Oct 2022
Thanks a lot Team!
you are amazing!!

Plus de réponses (2)

Torsten
Torsten le 31 Oct 2022
Modifié(e) : Torsten le 31 Oct 2022
Don't know if you have enough RAM for this. Note that the distance matrix pdist2(group1,group2) will be 70000 x 80000 in your case.
group1 = [1 3 -5; 2 -1 4; 3 4 90];
group2 = [0 4 7; 3 3 -56];
m = mean(min(pdist2(group1,group2).'))
m = 33.7672
  1 commentaire
Chm
Chm le 31 Oct 2022
Modifié(e) : Chm le 31 Oct 2022
Thanks a lot Torsten for your prompt reply. I will check it and let you know. I have 32Gb

Connectez-vous pour commenter.


Jan
Jan le 31 Oct 2022
Modifié(e) : Jan le 1 Nov 2022
data = table2array(readtable("test.xlsx"));
% group1 = length(data(~isnan(data(:,1)))); Faster:
group1 = nnz(~isnan(data(:,1)));
group2 = nnz(~isnan(data(:,5)));
tic
values = zeros(group1, 1); % Pre-allocate
for i = 1:group1
% Wastes time: display(i);
% Do you reall need the huge point(i,j) array? If not, collect the data
% in a scalar:
minval = inf;
for j = 1:group2
% Avoid the expensive SQRT at searching for the minimum:
point = (data(j,5)-data(i,1))^2 + ...
(data(j,6)-data(i,2))^2 + ...
(data(j,7)-data(i,3))^2;
if point < minval
minval = point;
end
end
values(i) = sqrt(minval); % One SQRT is enough
end
avg = mean(values);
toc
Vectorizing the inner loop is most likely faster:
point = (data(1:group2,5) - data(i,1))^2 + ...
(data(1:group2,6) - data(i,2))^2 + ...
(data(1:group2,7) - data(i,3))^2;
values(i) = sqrt(min(point)); % One SQRT is enough
Now avoid creating the submatrices repeatedly:
values = zeros(n, 1); % Pre-allocate!
A = data(:, 5:7);
B = data(:, 1:3);
for i = 1:n
point = sum((A - B(i, :)).^2, 2);
values(i) = sqrt(min(point)); % One SQRT is enough
end
avg = mean(values);
Compare this with the nice and clean PDIST method suggested by Torsten.
  3 commentaires
Torsten
Torsten le 31 Oct 2022
Compare this with the nice and clean PDIST method suggested by Torsten.
Too memory-intensive if the goal are only the row minima.
I think your second suggestion is a good compromise.
Jan
Jan le 1 Nov 2022
Locally in my R2018b installation this is the fastest:
S = 0;
a5 = data(:, 5);
a6 = data(:, 6);
a7 = data(:, 7);
for i = 1:n % Faster with PARFOR!
p = (a5 - data(i, 1)).^2 + ...
(a6 - data(i, 2)).^2 + ...
(a7 - data(i, 3)).^2;
S = S + sqrt(min(p));
end
avg = S / n;

Connectez-vous pour commenter.

Catégories

En savoir plus sur NaNs dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by