Speeding up lookup in large tables
7 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi, I hope you can help me speed up my process by making it smarter because it works but it takes a very long time.
Situation on hand:
1) Table A has a column Time that contains a linspace of one day in seconds, thus 86400 entries. --> strictly ascending, could be important later
2) Table B contains columns StartTime and EndTime of some hundred to thousands of periods with an according value B.Value. So e.g. from 1:30-2:00 a.m., the B.Value = 13, at a different given time it is 16 or whatever. Very simplified here of course. --> also strictly ascending.
So I'm checking for every entry of A.Time if it's within B.StartTime and B.EndTime to get the relevant value :
for i=1:1:height(A)
for j=1:1:height(B)
if A.Time(i) >= B.StartTime(j) && A.Time(i) <= B.EndTime(j)
A.Value(i)=B.Value(j);
break
end
end
end
However, this becomes a lengthy process depending on the height of A and B. I had the idea that because both tables are strictly ascending timewise and the hit is at j=1230, there is no need to start for j=1 again for the next iteration of i.
But I learned that you cannot manipulate a for loop from within so I did some tries with while loops but this led to no success. Do you guys have an idea for a smart solution? I did search on this platform but didn't find a matching discussion so I'm sorry if this question has been asked before.
Thanks so much in advance already!
0 commentaires
Réponse acceptée
Voss
le 27 Mar 2024
Here's one way that could work, if you have enough memory to generate the idx matrix, which is of size [size(A,1) x size(B,1)]
% crating some tables
rng(1)
Time = dateshift(datetime('now'),'start','day')+seconds(0:86399).';
StartTime = Time(1)+hours(0:23).';
n = numel(StartTime);
EndTime = StartTime+minutes(randi([1,120],n,1));
Value = rand(n,1);
A = table(Time);
B = table(StartTime,EndTime,Value);
A,B
% initialize A.Value
A{:,'Value'} = NaN;
% get the index of the row in B whose interval covers the entry in A.Time
idx = A.Time >= B.StartTime.' & A.Time <= B.EndTime.';
% r is row index in A, c is row index in B
[r,c] = find(idx);
In case intervals in B can overlap, if you want to take the first B.Value for A.Value do this:
A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));
Or if you want to take the last B.Value in overlapping intervals for A.Value (or if you don't care, or if there are no overlaps - in which case it doesn't matter) do this:
A.Value(r) = B.Value(c);
For example, two intervals in this B overlap from 10:00 to 10:05
% taking the B.Value from the first overlapping interval (9:00 to 10:05)
% for A.Value from 10:00 to 10:05
A.Value(r(end:-1:1)) = B.Value(c(end:-1:1));
A(10*3600:10*3600+3,:)
A(10*3600+300:10*3600+300+3,:)
% taking the B.Value from the last overlapping interval (10:00 to 10:51)
% for A.Value from 10:00 to 10:05
A.Value(r) = B.Value(c);
A(10*3600:10*3600+3,:)
A(10*3600+300:10*3600+300+3,:)
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Surfaces, Volumes, and Polygons dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!